25 

In re Application of: Pietrokovski et al Examiner: Ogunbiyi 

Serial No.: 10/534,544 Group Art Unit: 1645 

Filed: May 10, 2005 Attorney Docket: 29489 

Office Action Mailing Date: Dec 18, 2008 

REMARKS 

Reconsideration of the above-identified application in view of the amendments 
above and the remarks following is respectfully requested. 

Claims 1 and 5-121 are in this Application. Claims 19-121 have been 
withdrawn from consideration. Claims 1 and 5-18 have been rejected. Claims 10 and 
16 have now been amended. 



Rejections Maintained 
The Examiner has maintained the rejection of claims 16-18 under USC 112, 
Second Paragraph. 

The Examiner states that these claims are vague and indefinite as to which part 
of a virus or a cell the affinity tag binds to. 

The instant specification provides several specific examples of affinity tags 
that can be used by the present invention, including "streptavidin, His-tags, strep-tags, 
epitope tags, maltose-binding proteins, and chitin-binding domains." Such examples 
clearly demonstrate the type of tags that can be used with the present invention and 
clearly identify, to the ordinary skilled artisan, the part or parts of the virus or cell that 
would be bound by such tags. 

Detailed description regarding use of such affinity tags is provided in section 
[0109]-[01 18] of the published application. 

In addition, it should be noted that affinity tags are well known in the art and 
are routinely used for affinity-based purification. The present invention does not teach 
or suggest novel affinity tags or novel uses for such tags , but rather describes art- 
acceptable use of affinity tags in combination with the autoprocessing segment of the 
present invention. In light of the description and examples provided in the instant 
specification and the wealth of knowledge available in the art, Applicant is of the 

■ 

opinion that one of ordinary skill in the art would be more than capable of modifying 
the present invention to include affinity tags suitable for purification of any cell or 
virus desired. 



26 



In re Application of: Pietrokovski et al 
Serial No.: 10/534,544 
Filed: May 10, 2005 



Examiner: Ogunbiyi 
Group Art Unit: 1645 
Attorney Docket: 29489 



Office Action Mailing Date: Dec 18, 2008 

Notwithstanding from the above and in the interest of expediting prosecution 
of this case, Applicant has elected to amend claim 16 to recite "wherein said molecule 
is displayed on a virus or a cell". Support for such an amendment is provided 
throughout the instant specification, see for example, sections [0172]-[0174]. 



The Examiner has rejected claim 16 under 35 USC 112, First Paragraph, as 
failing to comply_with the written description requirement. 

The Examiner states that the specification does provide support for the binding 
of a molecule that forms a part of a virus or cell. 

Claim 16 has now been amended to clarify that said molecule is displayed on 
the virus or cell. Support is provided by sections [0172]-[0174] of the published 
application. 

The Examiner also rejected claims 1 and 5-18 under 35 USC 112, First 
Paragraph, as failing to comply with the written description requirement. 

The Examiner states that the sequence set forth by SEQ ID NO: 31 
encompasses an extremely large number of different species because of the variability 

allowed in the sequence. 

The Examiner further states that the specification teaches in Example 4 page 
56 that a chimeric protein which comprises the type A BIL domain BIL4_cloth (SEQ 
ID NO:31) has the capacity to efficiently display auto splicing and carboxy terminal- 
auto-cleaving, but does not set forth which variant was used for the experiment, i.e. its 
specific sequence. 

The Examiner fails to understand that SEQ ID NO:31 embodies a specific 
sequence of one type of an autoprocessing segment. 



New Rejections 
35 U.S.C. § 112, First Paragraph Rejections 



27 

In re Application of: Pietrokovski et al Examiner: Ogunbiyi 

Serial No.: 10/534,544 Group Art Unit: 1645 

Filed: May 1 0, 2005 Attorney Docket: 29489 

Office Action Mailing Date: Dec 18, 2008 

By aligning numerous sequences, the present inventor clearly demonstrated 
that BIL domains share consensus sequence motifs that are spaced apart by regions of 
great variance in amino acid sequences (see Figures 3a-g). 

In SEQ ID NO:31 (as with any BIL domain) intervening sequences (designated 
by Xaa in SEQ ID NO:31) are not a part of the autoprocessing-defining sequence. 
These sequences can include any amino acid sequence , since their only function is to 
space apart the sequences that define autoprocessing. Since the sequence of these 
'spacers' does not contribute to autoprocessing, such sequences can be embodied by 
any combination of any amino acids, as is set forth by SEQ ID NO:31. 

With respect to specific sequences, the instant application identifies 
BIL4_cloth as one specific example of an autoprocessing segment that includes SEQ 
ID NO:31 (see section [0103] of the published application). The specific amino acid 
sequence of BIL4_cloth is provided by Table 1 as represented by coordinates 31 1-345 
of contig 23020817. In addition, sections [0188]-[0193] describe construction of the 
specific sequences used in the experiments. The sequence region containing the 
autoprocessing segment which was PCR amplified using the primer sequences listed 
in these sections is described in Table 1 of the instant application. 

The Examiner has also rejected claims 1 and 5-18 under 35 USC 112, First 
Paragraph, as failing to comply with the enablement requirement. 

The Examiner states that SEQ ID NO:31 encompasses an extremely large 
number of different species because of the variability allowed in the sequence. 

The Examiner also states that it is well known that amino acid substitutions 
anywhere in a protein including in regions not required for activity can affect protein 
structure and function. The Examiner concludes that therefore, undue 
experimentation would be required of the skilled artisan to make and use the instant 
invention as claimed. 



28 

In re Application of: Pietrokovski et al Examiner: Ogunbiyi 

Serial No.: 10/534,544 Group Art Unit: 1645 

Filed: May 10, 2005 Attorney Docket: 29489 

Office Action Mailing Date: Dec 18, 2008 

As is stated above, SEQ ID NO:31 is a specific example of an autoprocessing 
sequence which includes specific sequence motifs separated by highly variable regions 
of a specific length. 

Variability in 'spacer' regions characterizes autoprocessing domains. For 
example, the Inteins and Hog Hint domains of the Hint domain family are 130-160 
amino acids long and share 4-6 conserved sequence motifs linked via variable 
intervening (spacer) regions. Although BIL domains belong to the hint domain 
family, they are distinguished from Inteins and Hog Hints in that they include highly 
variable 'spacers' since the BIL domains are integrated within non-conserved, hyper- 
variable proteins (see, http://bioinfo.weizmann.ac.il/~pietro/Hints/). In addition, and 
as was noted in the previous response, BIL domains are also unique in that they do not 
require additional flanking sequences for autoprocessing. 

Thus, the functional structure of BIL domains includes conserved sequence 
motifs separated by hypervariable intervening sequences (as exemplified by SEQ ID 
NO:31). 

Studies published following the priority date of the instant application have 
further substantiated this structure-function relationship of BIL domains by showing 
that the autoprocessing function of such domains does not depend on the sequence, 
identity of the 'spacer' regions. 

For example, Amitai et al. ["Distribution and function of new bacterial intein- 
like protein domains"Molecular Microbiology 47:61-73 (2003)] and Dassa et al. 
["Protein splicing and auto-cleavage of bacterial intein-like domains lacking a C- 
flanking nucleophilic residue" J Biological Chemistry 279:32001-32007 (2004)] 
provide theoretical and computational evidence for the ability of BIL domains to auto 
process with different 'spacers'. Amitai et al. provide computational evidence by 
showing the presence of diverse BIL spacers in BIL domains having similar or 
identical motifs while Dassa et al. constructed a detailed chemical model which 
illustrates how BIL domains are capable of autoprocessing with diverse spacer 
sequences. 



29 



In re Application of: Pietrokovski et al 
Serial No.: 10/534,544 
Filed: May 10, 2005 

Office Action Mailing Date: Dec 18, 2008 



Examiner: Ogunbiyi 
Group Art Unit: 1645 
Attorney Docket: 29489 



35 U.S.C. § 112, Second Paragraph Rejections 



The Examiner has rejected claim 10 under 35 USC 1 12, Second Paragraph as 
being indefinite. 

The Examiner states that claim 10 depends from claim 5 and recites the 
limitation "wherein said segment of the polypeptide adjacent to said amino terminal 
end of said autoprocessing segment". The Examiner states that there is insufficient 
antecedent basis for this limitation. 

Claim 10 has now been amended to clearly identify the polypeptide and 
polypeptide segments referred to. 



In view of the above amendments and remarks it is respectfully submitted that 
claims 1, 5-18 are now in condition for allowance. A prompt notice of allowance is 
respectfully and earnestly solicited. 




Respectfully submitted, 



Martin D. Moynihan 
Registration No. 40,338 



Date: October 13, 2009 



Enclosures: 



• Petition to Revive an Unintentionally Abandoned Application 

• Request for Continued Examination (RCE) 

• Two References (Amitai et al. and Dassa et al.) 



The Journal or Biological Chemistry 

© 2004 by The American Society for Biochemistry and Molecular Biology, Inc. 



Vol. 279, No. 31, Issue of July 30, pp. 32001-32007, 2004 

Printed in E/.S-A. 



Protein Splicing and Auto-cleavage of Bacterial Intein-like 
Domains Lacking a C -flanking Nucleophilic Residue*^ 

Received for publication, April 26, 2004, and in revised form, May 17, 2004 
Published, JBC Papers in Press, May 18, 2004, DOI 10.1074/jbc.M404562200 

Bareket Dassa, Haim Haviv, Gil Amitai, and Shmuel Pietrokovskii 

From the Department of Molecular Genetics, the Weizmann Institute of Science, Rehovot, Israel 76100 



Bacterial intein-like (BLL) domains are newly identi- 
fied homologs of intein protein-splicing domains. The 
two known types of BIL domains together with inteins 
and hedgehog (Hog) auto-processing domains form the 
Hog/intein (HINT) superfamily. BLL domains are dis- 
tinct from inteins and Hogs in sequence, phylogenetic 
distribution, and host protein type, but little is known 
about their biochemical activity. Here we experimen- 
tally study the auto-processing activity of four BLL do* 
mains. An A- type BIL domain from Clostridium thermo- 
cellum showed both protein-splicing and auto-cleavage 
activities. The splicing is notable, because this domain 
has a native Ala C -flanking residue rather than a nu- 
cleophilic residue, which is absolutely necessary for in- 
tein protein splicing. B-type BIL domains from 
Rhodobacter sphaeroides and Rhodobacter caps u la t us 
cleaved their N' or C ends. We propose an alternative 
protein-splicing mechanism for the A-type BIL domains. 
After an initial N-S acyl shift, creating a thioester bond 
at the N' end of the domain, the C end of the domain is 
cleaved by Asn cyclization. The resulting amino end of 
the C -flank attacks the thioester bond next at the N' end 
of the domain. This aminolysis step splices the two 
flanks of the domain. The B-type BIL domain cleavage 
activity is explained in the context of the canonical in- 
tein protein-splicing mechanism. Our results suggest 
that the different HINT domains have related biochem- 
ical activities of proteolytic cleavages, ligation and 
splicing. Yet the predominant reactions diverged in 
each HINT type according to their specific biological 
roles. We suggest that the BIL domain cleavage and 
splicing reactions are mechanisms for post-translation- 
ally generating protein variability, particularly in extra- 
cellular bacterial proteins. 



Bacterial intein-like (BIL) 1 domains are newly identified 
protein homologs of intein protein-splicing domains (1). The 
two known types of BIL domains together with inteins and 
hedgehog-like (Hog) auto-processing domains form the HINT 
(Hogflntein) domain superfamily (2), Inteins and Hogs have 
related auto-catalytic protein-processing activities. Hog do- 



* The costs of publication of this article were defrayed in part by the 
payment of page charges. This article must therefore be hereby marked 
"advertisement" in accordance with 18 U.S.C. Section 1734 solely to 
indicate this fact. 

fsl The on-line version of this article (available at httpyAvwwjbc.org) 
contains supplemental Figs. Si -S3 and Tables S-I and S-II. 

$ To whom correspondence should be addressed. Tel.: 972-8-9342747; 
Fax: 972-8-9344108; E-mail: shmuel.pietrokovski@weizmann.ac.il. 

1 The abbreviations used are: BIL, bacterial intein-like; B, BIL; Rsp, 
Rhodobacter sphaeroides; Rca, Rhodobacter capsulatus; Cth, C. thermo- 
cellum; HINT, Hog/intein; MS, mass spectrometry; MALDI, matrix- 
assisted laser desorption/ionization; M, maltose-binding protein; C, 
Chitin-binding domain. 



mains rearrange their N' -peptide bond into a thioester bond. 
This thioester is cleaved by a nucleophilic attack of a choles- 
terol molecule bound by a downstream domain (3, 4). A similar 
nucleophilic attack occurs during the protein splicing of inteins 
out of their protein hosts. The rearranged ester/thioester bond 
at the intein N' end is attacked by the nucleophilic side chain 
of the intein C -flanking residue followed by additional splicing 
reactions (5). Intein protein splicing thus depends on an invar- 
iable Cys, Ser, or Thr nucleophilic C -flanking residue ( + 1) for 
the trans-esterification and acyl rearrangement steps (2, 6). 

BIL domains are distinct from inteins and Hogs in sequence, 
phylogenetic distribution, and host protein type (1). Each of the 
two BIL types has characteristic and unique sequence features 
that cluster them separately from other HINT types. Although 
inteins are integrated in highly conserved sites of essential 
proteins and Hogs are present in hedgehog and related nema- 
tode proteins, BIL domains are integrated in variable regions of 
non-conserved diverse bacterial proteins, some of which have 
extracellular motifs. This leads to the hypothesis that BIL 
domains may have biological roles different from those of other 
HINT domains (1). Yet little is known regarding the biochem- 
ical activity of each BIL type. 

We previously described (1) the catalytic activity of an A-type 
and a B-type BIL domain. The A-type BIL domain was shown 
to have protein-splicing and C -cleavage activities. However, 
this domain was naturally flanked by a Thr + 1 residue, which 
is typical of inteins but not of A-type BIL domains. Only 15% of 
known A-type BIL domains is followed by Ser or Thr, and none 
is followed by Cys residues. An A-type BIL domain with + 1 Tyr 
residue was shown recently by Southworth et al. (7) to have 
N' -terminal cleavage but no protein-splicing activity. The B- 
type BIL domain was examined previously by us only in a 
cell-free system. It was shown to be active with preliminary 
evidence for cleavage and protein splicing. Peptide splicing 
outside the context of intein-like domains also was shown re- 
cently to occur in the proteasome, generating variant peptides 
to be displayed on major histocompatibility complex class I 
proteins (8). 

Here we examine in detail the auto-cleavage and splicing 
activity of four BIL domains: one A-type BIL domain with a 
native non-nucleophilic C -flanking residue (Ala +1) and three 
different B-type BIL domains. We also show that BIL domains 
are present in more major groups of bacteria and in proteins 
likely to be secreted. The probable functions and chemical 
reaction mechanisms of BIL domains and their relation to 
inteins are discussed. 

EXPERIMENTAL PROCEDURES 

Bacterial Strains and DNA Primers — Rhodobacter sphaeroides 2.4.1 
(Rsp) genome was a kind gift from Dr. Steven L. Porter (University of 
Oxford). Rhodobacter capsulatus (Rca) MD1 genome was a kind gift 
from Dr. Fevzi Daldal (University of Pennsylvania), and Clostridium 
thermocellum (Cth) genome was a kind gift from Dr. Ying Tsai (Uni- 
versity of Rochester). The following BIL domains were cloned: BIL4-Cth 



This paper is available on line at http://www.jbc.org 



32001 



32002 



Activity of Intein-like Domains Lacking a C -nucleophile 

Table I 



Primer name Primer sequence 



BILl-Rsp 

5p-Nrsp-bill 

3p-Nrsp-bill 
BIL2-Rsp+ flanks 

5p-rsph-bil2 

3p-rsph-bil2 
BIL2-Rsp-no flanks 

5p-rsp2-bil-onl y+ 1 

3p-rsp2-only+l 
BIL4-Cth 

5pl.BIL4 Cth 

3p408.BIL4 Cth 
1522-Rca 

5pBIL 

1522-108bp 

3pBIL 

1522+105bp 

a Number of residues flanking the BIL domain. 

(NCBI gi code 23020817); BILl-Rsp (NCBI gi code 22959584); BIL2-Rsp 
(NCBI gi code 22959191); and 1522-Rca (1). The BIL domains were 
amplified by PCR using the primers in Table I and cloned between two 
protein tags in a plasmid termed pC2C (as described by Amitai et at. 
(1)). This plasmid is a modification of the pMALC2 vector (New England 
BioLabs, Beverly, MA) containing the malE gene for maltose-binding 
protein (M) from Escherichia coli and a downstream cbd gene coding for 
chi tin-binding domain (C) from Bacillus circulans). 

Functional Assay of Protein-splicing and Cleavage Activity — The cod- 
ing sequence of different BIL domains (B) was cloned in-frame between 
two protein tags, the maltose-binding protein (M) upstream and the 
chitin-binding domain (C) downstream. The chimeric protein, M-B-C, 
was overexpressed and extracted in E. coli bacteria as described previ- 
ously (1). Protein extraction buffer contained 20 mM Tris, pH 7.4, 200 
mil NaCl, 1 mM EDTA, and 1 mM sodium azide. 

Purification of Tagged Protein Products — Soluble protein products 
containing either a C or a M tag were purified on affinity columns using 
chi tin (New England Biolabs) or amylose (New England Biolabs) beads, 
respectively. Lysed cell supernatant in extraction buffer was applied to 
beads for 1 h at 4 °C with shaking. Elution of proteins from chitin beads 
was done by mixing the beads with SDS-PAGE sample loading buffer 
and boiling for 2-3 min. Extraction buffer with 10 mM maltose was used 
to elute proteins from amylose beads. 

Heat Purification of BIIA-Cth Domain — The supernatant of E. coli 
cell lysate overexpressing the BIL4-Cth construct was heated in extrac- 
tion buffer to 37-80 °C for 20 min. Soluble proteins were separated 
from the denatured ones by centrif ligation at 13,000 rpm for 3 min and 
applied on an SDS gel. 

In Vitro Protein Transcription I Translation — In vitro transcription/ 
translation was carried out using E. coli S30 extract for circular DNA 
system (Pro mega, Madison WI) as described by Amitai et al. (1). 

Western Blot Analysis — Western blot analysis was used to identify 
protein products containing either the M or C tag and to identify the 
GroEL and DnaK protein chape rones. To identify the M tag, monoclonal 
mouse antibodies directed at maltose-binding protein (Novus Biologi- 
cals, Littleton, CO) were used in a 1:800 ratio. To identify the C tag, 
polyclonal rabbit antibodies directed at CBD (New England Biolabs) 
were used in a ratio of 1:5000. Antibodies for GroEL were a kind gift 
from Prof. Amnon Horovitz (rabbit antibodies, used in a ratio of 1:1000), 
and DnaK mouse antibodies (Stressgen) were used in a ratio of 1:1000. 
The secondary antibodies used were horseradish peroxidase-conjugated 
goat anti-mouse IgG or goat anti-rabbit IgG (Jackson I mmunoRe search 
Laboratories, West Grove, PA) in a ratio of 1:10000. Chemilumines- 
cence detection was held using SuperSignal (Pierce) according to the 
manufacturer's protocol. 

Mass Spectrometry (MS) Methods — Intact molecular weight meas- 
urements and peptide mass mapping by matrix-assisted laser desorp- 
tion/ionization (MALDI) MS were performed at the Weizmann Institute 
Biological Mass Spectrometry unit and at the S molar Center for pro- 
teins (Technion, Israel). Electroelution from gel followed by in-gel di- 
gestion with trypsin, chymotrypsin, or V8 proteases was performed and 
analyzed as described previously (37). 

N- terminal Amino Acid Sequencing — Proteins were electrophoresed 
by SDS-PAGE, and selected bands were prepared as described by Ami- 
tai et al. (9) and subjected to Ed man degradation at the Weizmann 
Institute Biological Mass Spectrometry Unit. 



Restriction Flank amino 

site acids' 3 



Computational Sequence Analysis — Sequence searches used the 
BLAST programs (10) and the BLIMPS program for block-to-sequence 
searches (11). Block multiple sequence alignments and phylogene- 
tic analysis were conducted as described by Amitai et al. (1). Protein 
motifs were detected using the InterProScan tool (www.ebi.ac.uk/ 
interpro/scan .ht ml). 

RESULTS 

To characterize the proteolytic activity of new A- and B-type 
BIL domains, each BIL domain (B) was cloned in-frame be- 
tween two protein tags, maltose-binding protein (M) upstream 
and chitin-binding domain (C) downstream. Protein products of 
each chimeric gene (M-B-C) were examined in vivo and in vitro 
by various methods. To characterize the BIL domain activity in 
its native protein context, some of the domains were cloned 
with their full or partial native flanks, whereas others were 
cloned only with single residue flanks. 

Protein Splicing and Cleavage ofanA-type BIL Domain with 
Ala +1 Residue — BIL4-Cth is one of the 23 A-type BIL domains 
we identified in the thermophilic bacterium Cth (1). It is typical 
of most A-type BIL domains to have all of the intern protein- 
splicing active site residues with the exception of the C -flank- 
ing nucleophile (supplemental Fig. S3). Instead of Cys, Ser, or 
Thr invariably present in inteins, BIL4-Cth is followed by an 
Ala + 1 residue. This is the residue present in 18% A-type BIL 
domains (fraction calculated as weighted average of putative 
active domains). 

The BIL4-Cth M-B-C precursor was overexpressed in vivo as 
a double-tagged protein, and its products were detected and 
analyzed. Putative protein-splicing products, the excised BIL 
domain and the ligated M-C flanks, and the M-B- and M- 
cleavage products were detected. These products were identi- 
fied by Western blotting of total cell lysates and affinity-puri- 
fied proteins separated on SDS-PAGE (Fig. LA). Relative 
quantities of products were calculated according to measure- 
ments taken from Coomassie Blue-stained SDS gels of amylo- 
se-purified proteins and total lysates (supplemental Fig. SI). 
Only trace amounts of the M-B-C precursor were detected 
under all of the separation procedures, indicating an efficient 
processing. Spliced product M-C comprised 20—25% of the final 
products, whereas C -cleavage product M-B comprised ~5% of 
the final products. M and B proteins comprised most of the final 
products, indicating that they were generated by a combination 
of N'- and C -cleavages. The final amount of B protein was 
much larger than the amount of the M-C-splicing product. This 
finding implies that both protein splicing and cleavage at its N' 
and C ends released the B protein. The C product was not 
identified in the gels, perhaps because of cellular degradation. 

To characterize the putative splicing product using MALDI 



GAATTCATGGCTGACCAAATCCAGATCGG 
TCTAGAGCGGACGAGGACCCTTTCCGGT 

GAATTCGGTGATTCATC CTTGGGGCGA 
TCTAGAAAACACGGCAAGGGCGAGCGG 

GAATTCCTCTCCCTGACGGCCGGGACG 
TCTAGAGGGCCGGGTCACGGGATGGAG 

AAAAGGATCCTGCTTTGTTGCAGGCACGATG 
AAAATCT AG ATG C ATTATGCACC AAT ACTT CAT 

GGATCCAACTACGATCCGACGAACCC 

TCTAGAACCATAGCCCTCAAGGCCGTC 



EcoRI 


+ 14 


Xbal 


+52 


EcoRI 


+32 


Xbal 


+9 


EcoRI 


+ 1 


Xbal 


+ 1 


Ba m HI 




Xbal 


+1 


Bam HI 


+36 


Xbal 


+35 



Activity of Intein-like Domains Lacking a C -nucleophile 



A 

Anti-C Anti-M 
T A C T A C 




Fig. 1. Protein-splicing activity of BIL4-Cth. A, Western blotting 
of E. coli overexpression of the M-B-C construct containing A- type 
BIL4-Cth domain, using anti-maltose-binding protein (Anti-M) or anti- 
CBD (Anti-C) antibodies. T, total cell lysate; A, amylose affinity column 
eluant; C, chitin affinity column eluant. B, purification of BIL domain 
by heat. Samples of total cell lysate were heated to different tempera- 
tures. Soluble proteins were isolated and separated on SDS-PAGE. 

MS, the M-C band was extracted from the gel and digested with 
proteolytic enzymes. The presence of the M and C domains was 
verified using MS/MS analyses. Furthermore, two peptide 
masses corresponding to splicing-j unction peptides were de- 
tected from the chymotrypsin digestion of the M-C band. One 
mass corresponded to a fully cleaved (N'-GSASRVDCG- 
GLTGL-C) peptide, and another mass corresponded to its mis- 
cleaved form (N'-GSASRVDCGGLTGLNSGLTTNPGVSAW- 
C) with high mass accuracies (Table II). The ligated splicing 
junction is between the second and third residues (Ser-Ala) 
with the Ser being coded by a linker joining the M tag to the 
BIL domain and the Ala being the native residue downstream 
of the BIL domain (Ala + 1). 

Spliced BIL domain was purified, and its identity was veri- 
fied by MS. We were able to purify the BIL domain by heat 
treatment, probably because it originated from a thermophilic 
bacterium. Incubation of total cell lysate at 80 °C left only the 
putative BIL domain in the soluble fraction (Pig. IB). Intact 
mass MS analysis of this 15-kDa band identified the expected 
mass of the BIL domain, and its sequence was verified by 
MS/MS analysis (see Table IV and data not shown). The exact 
C end of the BIL domain was identified by MS analysis as Asn 
as expected (Table III). 

A putative C' -cleavage product, M-B, was affinity-purified 
and identified by anti-M antibodies (Fig. XA). Its intact mass 
analysis corresponded to the expected mass of a C -cleavage 
product (Table IV). Other masses obtained from this sample 
corresponded to the M tag and to other smaller masses that 
could result from a cross-contamination of the M-B band by 
traces of smaller proteins on the gel. 

A protein band corresponding to the M tag was identified by 
Coomassie Blue staining and by Western blotting using anti-M 
antibodies (Fig. 1A and supplemental Fig. SI). This putative 
N' -cleavage product was observed in total ly sates and in elu- 
tions of both chitin and amylose affinity columns. 



32003 

To examine whether the Tris cell extraction and protein 
purification buffer promoted cleavage and splicing of the M- 
B-C precursor, the extraction and purification procedures were 
repeated using different buffers (Bis-Tris propane, HEPES, 
sodium phosphate, and borate). Same products and relative 
amounts were observed with all of these control buffers (data 
not shown). 

In Vivo and in Vitro Cleavage Activities of B-type BIL Do- 
mains — B-type BIL domains are more heterogeneous in se- 
quence than A- type domains (1). To characterize their activity, 
we cloned three different B-type BIL domains into the double- 
tagged system (described above): the two BIL domains present 
in R. sphaeroides termed BILl-Rsp and BIL2-Rsp and one of 
the 14 BIL domains present in R. capsulatus termed 1522-Rca. 
The conserved C sequence motif of B-type BIL domains is 
distinct from those motifs in other known HINT domains (1). 
The C end of the cloned BILl-Rsp and 1522-Rca is typical of 
B-type BIL domains, whereas BIL2-Rsp has an atypical C end 
(supplemental Fig. S3). 

N' -cleavage of B-type BILl-Rsp— BILl-Rsp, a B-type BIL 
domain from R. sphaeroides y was cloned between M and C tags 
with its native N'-14 residue and C-51 residue flanks and 
overexpressed in E. coli cells. M-B-C precursor M and B-C 
N' -cleavage products were identified by Coomassie Blue stain- 
ing and Western blotting of total lysate and affinity-purified 
protein samples (Fig. 2). To verify the nature of the N' -cleavage 
product, B-C, the band was micro-sequenced. The resulting 
sequence CXFTPGT) corresponded to the predicted N' end of 
the BIL domain, which also includes Cys-1, which usually 
cannot be detected by this method (supplemental Table S-I). 

An additional 58-kDa band was co-purified with the M-B-C 
precursor. Its analysis suggests that the band might include 
more than a single protein species. Both anti-M and anti-C 
antibodies reacted with this band. However, the peptide map- 
ping of the band identified peptides from both the M tag and 
the E. coli GroEL chaperone protein. Additionally, no peptides 
from the B and C domains were identified (data not shown). 
Intact mass of the band identified a mass of 58.317 kDa corre- 
sponding to GroEL and an additional unidentified protein mass 
of 65.175 kDa (Table IV). As a control, we checked a cross- 
reaction of anti-C antibodies with purified GroEL protein (sup- 
plemental Fig. S2B). Anti-C antibodies showed reactivity to- 
ward GroEL, probably because of their polyclonal nature. 

GroEL chaperone was detected in protein samples purified 
the following affinity columns: on amylose; chitin; and amylose 
followed by chitin. This indicates a tight and specific binding of 
GroEL with the precursor and/or protein products. The associ- 
ation of GroEL with unfolded proteins is reversible to some 
extent upon incubation with ATP-Mg-K (12). Such incuba- 
tion of washed protein samples bound on chitin reduced but 
did not eliminate the amount of GroEL eluted from chitin 
(supplemental Fig. S2B). 

C -cleavage of B-type BIL2-Rsp in Vivo, in Vitro, and in 
Cell-free Systems — BIL2-Rsp was cloned between M and C tags 
with one native flanking residue at either end (N'-Leu and 
C'-Pro) and overexpressed in E. coli and in a cell-free system. 
In both systems, the main product was the M-B-C precursor 
with small amounts of M-B- and M -cleavage products (Fig. 3A). 
An additional band of —70 kDa appeared above the precursor 
band when expressed in vivo. This band was identified as the 
E. coli DnaK chaperone protein. It was not detected in the 
overexpressed control protein, M-C. Identity of the above prod- 
ucts was verified by Western blotting, N-terminal sequencing 
of the M-B-C band, MALDI-MS peptide mapping of the M-B 
and DnaK bands, and MALDI MS intact mass analysis of the 
M-B band (Fig. 3A, Table IV, and data not shown). This last 



32004 



Activity of Intein-like Domains Lacking a C -nucleophile 



Table II 

MALDI-TOF results of BIL4-Cth splicing junction chymotryptic peptides 



Sequence position 


rM+Hp calculated 
mass 0 


IM + HP 
observed mass 


Mass accuracy 


Sequence 




Da 




ppm 




392-405 


1349.65 


1349.71 


44.45 


GSASRVDCGGLTGL 


392-418 


2634.26 


2634.80 


205 


GSASRVDCGGLTGLNSGLTTNPGVSAW 



° Mass calculated with carboxyamido methyl cysteine modification. 

Table III 

Electrospray Ionization TOF results ofBIIA-Cth C-terminal 

tryptic peptides 



Sequence 
position 


Calculated Observed 
mass mass" 


Mass 
accuracy 


Sequence 


118-135 
113-135 


Da 

2108.9545 2109.0533 
2760.2927 2760.3261 


ppm 
46.8 
12.1 


VDDFHTYHVGDNEVLVHN 
VYNFKVDD FHTYHVGDNE VL VHN 



■ Observed masses are an average of [M+2H] , [M+3H] 3+ , and 
[M+4H] 4 * masses for the first peptide and of IM+3H] 3+ and [M+4H] 4 * 
masses for the second peptide. 

Table IV 

MALDI-TOF results ofBIL domains splicing and cleavage . 

products intact mass 



Clone 


Probable 
product 


[M+Hp 
calculated mass 


IM+Hp 
observed 
mass 


Mass 
accuracy 






kDa 




% 


BIL4-Cth 


MB 


58.038 


58.552 


0.89 




M 


43.092 


43.983 


2.06 




B 


14.963 


15.050 


0.58 


BIL2-RSP-only 


M-B 


56.782 


56.071 


1.25 




DnaK 


69.118 


69.255 


0.19 




M-B-C 


64.141 


65.430 


2.00 


BIL1-RSP 


GroEIVMC 


57.332/57.355 


58.317 


1.72 


RCA-1522 


GroEL 


57.332 


57.810 


0.83 



analysis gave a measured mass of 56.071 kDa, slightly smaller 
than the expected mass of the putative M-B product. 

To examine the in vitro activity of BIL2-Rsp, the overex- 
pressed M-B-C precursor was isolated by sequential affinity 
columns (amylose followed by chitin) and was incubated in the 
extraction buffer in different temperatures for different time 
periods. Increasing amounts of the M-B product were clearly 
detected within 1 day at 4 °C (Fig. 3B). The presence of the M 
band may be attributed to the N' -cleavage of the BIL domain; 
however, the complementary B-C band was not detected. Al- 
ternatively, this could have resulted from protein degradation. 

Similar results were observed when BIL2-Rsp domain was 
cloned with its full native flanks (data not shown). However, 
this clone also underwent cleavage in an Arg-Arg dipeptide 
present in the N' -flank of the BIL domain as verified by N- 
terminal sequencing. This cleavage was also observed when the 
flanks were cloned without the BIL domain (data not shown). 
Thus, we suggest that this activity is unrelated to the BIL 
domain and is probably due to an E. coli protease (perhaps 
OmpT) that can cleave the BIL domain flank. 

No Activity of B-type Rca-1522 B/L^1522-Rca B-type BIL 
domain is natively present in a very large R. capsulatus pro- 
tein. The domain is preceded by 1821 residues and is followed 
by 52 residues. The upstream flank of this BIL domain includes 
RTX (repeats- in toxin) calcium-binding repeat motifs, charac- 
teristic of secreted proteins (13). The BIL domain was cloned 
with 36 N' -flanking residues and 35 C -flanking residues in the 
double tag expression vector. Overexpression of the vector 
yielded only the M-B-C precursor and E. coli GroEL protein as 
verified by Coomassie Blue staining, Western blotting, N- ter- 
minal sequencing, and MS analysis (Table IV and supplemen- 
tal Fig. S2A). 



Isolated M-B-C precursor was incubated in vitro at 4 or 37 °C 
in two different environments of pH 7.4 and 8.5. No products of 
the precursor were detected under any of these conditions. 

Species and Protein Host Distribution of BIL Domains — BIL 
domains were identified originally in species from Gram-neg- 
ative a, 0, and y Proteobacteria and from Gram-positive Acti- 
nobacteria and the Bacillus/Clostridium group (1). Further 
data base searches now broaden the taxonomic range of BIL 
domains to major bacterial divisions and lineages (supplemen- 
tal Table S-II). A-type BIL domains were found in 8 Proteo- 
bacteria, Cyanobacteria, Spirochaetes, Planctomycetes, and 
Verrucomicrobia. B-type BIL domains were found in a Pro- 
teobacteria, Rhizobium, and Silicibacter species. 

Sequence analyses of over a hundred identified BIL flanks 
reconfirmed our previous observation of the nature of the BIL 
domain hosts. BIL domains are present in homologs of known 
and predicted secreted proteins. This is exemplified by Strepto- 
myces avermitilis, Verrucom icrobium , and Gloeobacter A-type 
BIL domains that are found downstream of long (400—5400 res- 
idues) Rhs core elements. Rhs elements are composite genetic 
elements, and their cores are believed to be cell-surface ligand- 
binding proteins (14). The BIL domains are present in the hyper- 
variable core extension region that can be shuffled between the 
core and downstream open-reading frame regions. 

DISCUSSION 

In this study, we show that a typical A-type BIL domain is 
capable of protein splicing without a C'-nucleophilic + 1 residue 
and that B-type BIL domains can cleave their N' or C ends. 
Both types of domains are not uncommon, appearing in diverse 
bacterial divisions. These findings reflect the auto-processing 
nature of intein-like domains. We explain the N'- and C- 
cleavage of B-type BIL domains by reactions occurring in the 
canonical intein protein-splicing mechanism and propose an 
alternative pathway for A-type BIL domains splicing. Our re- 
sults suggest that the biochemical activities of the BIL domains 
are distinct from inteins, and their native biological function is 
probably protein modification by splicing and cleavage activity. 

Protein-splicing Mechanism without a Nucleophilic +1 Res- 
idue — Intein protein-splicing mechanism was largely deter- 
mined by mutational analysis of a few representative intein 
domains (2, 6, 15-18). This allowed the delineation of the 
biochemical reactions of protein splicing and supported splicing 
as the native activity of inteins. Other evidence for the nature 
of interin activity are the high efficiency of intein protein- 
splicing, intein distribution in species and host proteins, and 
the function of intein genes as selfish genetic elements (19). 

Currently, the accepted mechanisms for intein protein-splic- 
ing require a Cys, Ser, or Thr + 1 residue at the intein imme- 
diate C -flank. This nucleophilic +1 residue is crucial for the 
trans-esterification step and for the final acyl rearrangement 
(Fig. 4, steps 2A and 4A). In inteins with N'-Ala-l, the nucleo- 
philic + 1 residue directly attacks the peptide bond at the intein 
N' end (16). Mutating the intein active site residues, including 
the + 1 nucleophilic residue, abolishes splicing or leads to cleav- 
age of the intein C\ N' end, or both (15, 20). 

In our study, the major products of BIL4-Cth expression 
were N'- and C -cleavages, whereas protein splicing was ap- 



Fig. 2. N'-cleavage activity of BIL1- 
Rsp. Protein products from E. coli over- 
expression of M-B-C construct with 
B-type BILl-Rsp were eluted from amy- 
lose (A), chitin (C), or both CA+O affinity 
columns or analyzed in total cell lysate 
(T). Proteins were separated on SDS- 
PAGE and either stained with Coomassie 
Blue or detected by anti-M, anti-C, or an- 
ti-GroEL (AntiG) antibodies. See "Re- 
sults* for discussion of GroEL cross-detec- 
tion by anti-C antibodies. 



Activity of Intein-like Domains Lacking a C ' -nucleophile 

Coomassie 



MBC (72 kD) 
GroEL, M* (58 kD) 

M(43 kD) 
BC (28 kD) 



Anti-M 



Western Blots 

Anti-C 



32005 



Ant»-C 



C A 


A+C 


T A C A+C 


T A C A+C 


c 




A- 

ifCr 










> i ■ m >■ 









A 



B 



Coomassie Western Nut cell-free 

aoti-M anti-C antf-DoaK 
AC ACTA AC 



Ob 24b 




DoaK (70 kD) 

MBC(65kD) 
MB(56kD) 

M (4J kD) 

Fig. 3. C'-cleavage activity of BIL2-Rsp. A, left, protein products from E. coli overexpression of M-B-C construct with B-type BIL2-Rsp were 
eluted from either amylose (A) or chitin (C) affinity columns or analyzed in total cell lysate (T). Proteins were separated on SDS-PAGE and either 
stained with Coomassie Blue or detected by anti-M, anti-C, or anti DnaK antibodies. A, right, proteins translated in vitro in a cell-free system were 
labeled with l 33 S]Met. B, in vitro incubation of a purified precursor at 4 °C. 




A-Type BILs 



Thioester «<^MYV^*-fi HTOl 

ttacr mediate I Sf^"^ 1 - L 

I O 

CO-f;— a 




Prvcmar 



Interns 



R-CH.UH 



PI NSacyl shift 



f T" f 

L . —on 



Thioester 
Intermediate 



Cieavcd "V— V 
Intermediates <L 

, , I 

D 



» I 



f2Bj Asn cy clization 




12 A I Transesteriflcation 



tl 



M.N— pi g^' 



I 



I3BJ Aminofysis 



Branched 
Intermediate 



13 A I Asn cyclizaiion 



I 



ll 

o 



HA* CI 

Spliced ' | 
Intermediates «ju 

Mi 



T 



a — pi 
+ p 



HjN 




I + 



a 



sh 



! ! /*/ Succinimide hydrolysis [4AJ S-N acyl shift 




Spliced 
Intermediates 




C Ml Ol 



Excised BlUlntein domain 



Spliced protein 



Pig. 4. Canonical and proposed protein-splicing mechanisms for interns and A-type BLL domains. The intein/BIL domain is marked 
as a black rectangle flanked by an N-terminal flank (N) and a C-terminal flank (C). Right, canonical intein protein-splicing mechanism. Left, 
proposed protein-splicing mechanism of A-type BIL domains lacking a C'-nucleophilic residue. 



proximately a quarter to a fifth of the A-type BIL domain 
activity with almost complete processing of the precursor. Most 
probably, the initial cleavage activity was at the C end, pro- 
ducing the M-B and C products, followed by additional N'- 
cleavage of the M-B product, producing the M and B products. 



This is supported by the relative amounts of the final products 
and the absence of the B-C product. 

Our results show protein splicing of an A-type BIL with 
conserved sequence features closely related to inteins including 
all of the active site residues apart from the +1 residue. Hence, 



32006 



Activity of Intein-like Domains Lacking a C -nucleophile 



we propose a modified protein-splicing mechanism for A- type 
BIL domains. The mechanism is similar to the canonical pro- 
tein-splicing mechanism of interns, only differing in the nature 
of the nucleophilic attack on the thioester bond in the N' end at 
the BIL domain. 

Our suggestion includes the following steps of protein splic- 
ing in A- type BIL domains (Fig. 4). (i) A thioester is formed at 
the N' end of the domain by the N-S acyl shift (Fig. 4, step 1) by 
attack of the thiol group of the conserved Cys-1 residue on the 
carbonyl group of the peptide bond N- terminal to Cys-1. This 
reaction is the same as the first step of canonical intein protein 
splicing (15, 18, 20). (ii) Concomitantly, the conserved Asn 
residue at the C of the domain undergoes cyclization into an 
animosuccinimide ring, cleaving the peptide bond at the do- 
main C end (Fig. 4, step 2B). This step generates two interme- 
diate products: the N' -flank covalently connected to the BIL 
domain by a thioester bond and the detached C -flank. This 
reaction also occurs in intein protein splicing but only after 
ligation of the two intein flanks (Fig. 4, step 3A) (5, 21). In 
inteins, premature Asn cyclization results in C -cleavage and 
no splicing (22). Although the timing of Asn cyclization is 
tightly controlled in inteins, it can still occur when other steps 
of the splicing are blocked by mutations at the N'- and/or 
C -splice junction (17, 23-25). (iii) The free N terminus of the 
C -flank performs an aminolysis reaction of the labile thioester 
bond next at the N' junction of the domain formed in step i. 
This reaction ligates the two BIL domain flanks with a peptide 
bond and releases the BIL domain from its N' -flank. This step 
probably occurs immediately after step ii to prevent the disso- 
ciation of the C -flank from the N' -flank and BIL domain, (iv) 
Finally, the BIL domain C'-ammosuccinimide ring hydrolyzes 
into Asn or iso-Asn, similarly to inteins (Fig. 4, step 4) (26). 

Aminolysis reaction, involving an attack of the C -amine on 
a N' -ester, was proposed previously to occur in intein protein 
splicing (27, 28). A detailed analysis of representative inteins 
established the canonical protein-splicing mechanism and 
ruled out aminolysis as part of the process (15, 20). Considering 
our experimental results and the various residues in the +1 
position of A- type BIL domains, we suggest that these domains 
protein splice with an aminolysis reaction. 

Recently, aminolysis was proposed as part of a peptide-splic- 
ing activity of the proteasome that generates the displayed 
variant antigenic peptides (8). The cleaved peptides within the 
proteasome are attached transiently from the C end to Thr 
residues by ester bonds (21). Vigneron et al. (8) suggest that the 
N' end of another cleaved peptide from the same protein at- 
tacks this bond in an aminolysis reaction, ligating the two 
peptides. Aminolysis also occurs in other biological reactions, 
including the attachment of myristate to the N' end of proteins 
by W-myristoyltransferase (29). 

Why are inteins integrated upstream to Cys, Ser, or Thr 
residues when, as we show here, protein-splicing can proceed 
with other residues in this position? Being able to successfully 
integrate in a wider range of sites seems highly advantageous 
for selfish genetic elements such as inteins (19, 30). We believe 
the answer to this question is related to the differences be- 
tween the mechanisms for protein splicing in inteins and in 
A-type BIL domains. The intein domain and its flanks remain 
covalently attached until ligation of the flanks and release of 
the intein (Fig. 4). In our proposed mechanism for A-type BIL 
domains, the C -flank is detached from the BIL and its N' -flank 
before its ligation to the N' -flank. This may lead to a higher 
frequency of N'- and C -cleavage side products. Such partial 
splicing in inteins will reduce the amount of mature (spliced) 
host proteins, which are typically conserved, and crucial pro- 
teins, and might negatively affect cell survival. Perhaps even 



more harmful is the possible dominant-negative effect of the 
cleaved byproducts of intein hosts. In contrast, partial splicing 
of BIL domains (i.e. N'- and/or C -cleavage) may serve for 
increasing the protein host variability (1). 

Our results, together with previous reports of other atypical 
intein protein-splicing mechanism (9), show that this activity 
can proceed by several alternative and partially overlapping 
biochemical reactions. Thus, the canonical intein protein-splic- 
ing mechanism may need to be expanded, or its scope may need 
to be limited. Aminolysis and perhaps other atypical mecha- 
nisms may be the way some inteins and other HINT domains 
protein-splice. 

Cleavage Mechanisms of B-type BIL Domains — The B-type 
BIL domains were found by us to auto-catalytically cleave their 
N' or C ends. This activity is analogous to inteins protein- 
splicing side reactions and is common in N-terminal rearrange- 
ments of auto-processing proteins (2). Both intein and BIL 
domains have conserved Cys or Ser in position 1 whose thiol or 
hydroxyl groups are essential for the acyl rearrangement at the 
N terminus. Thus, the N'-peptide-bond of BILl-Rsp could be 
converted into a thioester through the N-S acyl shift, similarly 
to inteins (Fig. 4, step 1). In inteins, this reaction is followed by 
trans-esterification of the thioester by the side chain of the +1 
residue, forming a branch intermediate and leading to splicing 
product formation. Such products were not obtained in the 
BILl-Rsp precursor expression, suggesting that the labile thio- 
ester was hydrolyzed by water or by an external nucleophile. 
We do not exclude the possibility that this cleavage was cou- 
pled to ligation of the upstream flank with an external nucleo- 
phile, similar to the attachment of cholesterol to Hedge domain 
upstream to the Hog HINT domain. Such a ligation would 
modify the M tag and assign it with a higher mass. One of the 
BILl-Rsp yet uncharacterized products may correspond to this 
putative product. 

A previously proposed mechanism for C -cleavage of the Chy 
Rl intein mutant (9) and for Pab PolII intein (31) can explain 
the C -auto-cleavage of BIL2-Rsp. According to this finding, an 
attack of the BIL domain Ser-1 hydroxyl group on a peptide 
bond carbonyl at the C region of the domain would form an 
ester bond through the N-O acyl shift, which in turn can be 
hydrolyzed, detaching the BIL domain from its C -flank (9). 
This proposed mechanism is independent of a C' -nucleophilic 
residue. Assuming that BIL domains have the HINT fold, their 
N' end is in a position to cleave their C region. 

Our heterologous conditions of protein expression may alter 
the native activity of BIL domains. Overexpression in E. coli 
cells and changes in the domain context (BIL domain flanks), 
as well as in vitro conditions such as redox environment or 
temperature, may alter the protein in vivo fold and function. 
Nevertheless, in light of extensive experiments in other pro- 
teins and HINT domains, we assume that the BIL domain 
activity we observed is related to their native one. Improper 
folding of flanked B-type BIL domains may have triggered the 
overexpression of chaperones (DnaK, GroEL) (12). We propose 
that the chaperones, which were co-purified with B-type BIL 
but were absent in A-type BIL or the control vector, are not 
merely byproducts of the heterologous expression system. 
Chaperones may be involved in BIL domains proper folding, 
extracellular targeting, or biological activity. Attachment of 
chaperones to the BIL precursor may also spatially block its 
splicing activity. 

Biological Roles of Different Types of HINT Domains— The 
HINT superfamily currently includes four separate families: 
inteins; Hogs; A-type BIL; and B-type BIL domains. All of the 
families are homologous and share sequence, structure, and 
biochemical properties (2, 4, 6, 32). Yet each family is distinct 



Activity of Intein-like Domains Lacking a C -nucleophile 



32007 



in specific sequence features, protein host context, and biolog- 
ical roles. Members of each family can be diverse in sequence 
and are still found occasionally in new protein and phylogenetic 
contexts. It is likely that other HINT families will be discovered 
and characterized. Thus, identifying the family of a HINT 
domain can be an additional challenge to recognizing the do- 
main as a HINT type. 

Sequence motifs and structure folds characterizing the HINT 
superfamily and those specific to Luteins, Hogs, and BIL do- 
mains have been described previously (1, 33, 34). Most interns 
also include a central homing-endonuclease domain (35) not 
found in the other known HINT families. luteins are also 
integrated in conserved positions of essential proteins. Both 
these features are a consequence of the selfish element nature 
of intein genes (19, 30). Hog domains are located upstream to 
the cholesterol-binding domain and downstream to the Hedge 
domains and to the Wart and Ground domains of nematodes 
(36). The role of Hog domains in hedgehog proteins and per- 
haps also in the nematode proteins is post-translational mod- 
ification in the maturation process of their host protein. 

Less information is available for the two known BIL do- 
mains. Nevertheless, the experimental and computational re- 
sults we show in this work support our initial hypotheses. Most 
BIL domains are present in variable positions of non-conserved 
proteins. Many BIL host proteins also include motifs, repeats, 
and domains that characterize extracellular protein regions. 
We show here and in the first report of the BIL domains ( 1) that 
the biochemical activity of BIL domains includes protein splic- 
ing and auto-cleavage of their hosts. We suggest that the bio- 
logical role of BIL domains is to increase the variability of their 
hosts, mainly in extracellular protein regions, by cis- and trans - 
ligation of proteins and other moieties to the hosts. 

Acknowledgments — We thank Prof. Meir Wilcheck for supportive 
suggestions and Prof. Amnon Horovitz for samples of GroEL and its 
antibodies. We thank the Mass Spectrometry unit of the Weizmann 
Institute of Science (Rehovot, Israel) and The Smoler Protein Center, 
Department of Biology (Technion, Israel). 

REFERENCES 

1. Ainitai, G., Belenkiy, O., Dassa, B., Shainskaya, A., and Pietrokovski, S. (2003) 

Mol. Microbiol. 47, 61-73 

2. Paul us, H. (2000) Annu. Rev. Biochem. 69, 447-496 

3. Porter, J. A., Ekker, S. C, Park, W. J., von Kessler, D. P., Young, K. E., Chen, 

C. H., Ma, Y., Woods, A. S., Cotter, R. J., Koonin, E. V., and Beachy, P. A. 



(1996) Cell 86, 21-34 
4. Perier, F. B. (1998) Cell 92, 1-4 

6. Xu, M. Q., Comb, D. C, Paulus, H., Noren, C. J., Shao, Y., and Perier, F. B. 
(1994) EMBO J. 13, 5517-6522 

6. Perier, F., Noren, C, and Wang, J. (2000) Angew. Chem. Int. Ed. Engl 39, 

450-466 

7. South worth, M. W., Yin, J., and Perier, F. B. (2004) Biochem. Soc Trans. 32, 

250-254 

8. Vigneron, N., Stroobant, V., Chapiro, J., Ooxns, A., Degiovanni, G., Morel, S., 

Van Der Bruggen, P., Boon, T., and Van Den Eynde, B. J. (2004) Science 
304, 587-590 

9. Amitai, G., Dassa, B., and Pietrokovski, S. (2004) J. Biol. Chem. 279, 

3121-3131 

10. Altschul, S. F., Madden, T. L., Schaffer, A A., Zhang, J., Zhang, Z., Miller, W., 

and Lipman, D. J. (1997) Nucleic Acids Res. 25, 3389-3402 

11. HenikofT, S., Henikoff, J. G., Alford, W. J., and Pietrokovski, S. (1995) Gene 

(Amst.) 163, 17-26 

12. Bukau, B., and Horwich, A. (1998) Cell 92, 351-366 

13. Coote, J. (1992) FEMS Microbiol Rev. 8, 137-161 

14. Yong Dong, W. (1998) J. Bacterid. 180, 4102-4110 

16. Chong, S., Shao, Y., Paulus, H., Benner, J., Perier, F. B., and Xu, M.-Q. (1996) 
J. Biol. Chem. 271, 22159-22168 

16. Southworth, M. W., Benner, J., and Perier, F. B. (2000) EMBO J. 19, 

5019-5026 

17. Xu, M.-Q.. and Perier, F. B. (1996) EMBO J. 15, 5146-6153 

18. Romanelli, A., Shekhtman, A., Cowburn, D., and Muir, T. W. (2004) Prvc. Natl 

Acad. Set U. S. A. 101, 6397-6402 

19. Pietrokovski, S. (2001) Trends Genet. 17, 465-472 

20. Shao, Y, Xu, M. Q., and Paulus, H. (1996) Biochemistry 35, 3810-3816 

21. Groll, M., and Huber, R (2003) Int. J. Biochem. Cell Biol. 35, 606-616 

22. Wood, D. W., Wu, W., Belfort, G., Derbyshire, V., and Belfort, M. (1999) Nat. 

Biotechnol. 17, 889-692 

23. Cooper, A. A., Chen, Y. J., Lindorfer, M. A,, and Stevens, T. H. (1993) EMBO 

J. 12, 2575-2583 

24. Southworth, M. W., Amaya, K., Evans, T. C, Xu, M. Q., and Perier, F. B. (1999) 

BioTechniques 27, 110-120 

25. Chong, S., Montello, G. E., Zhang, A, Cantor, E. J., Uao, W., Xu, M. Q., and 

Benner, J. (1998) Nucleic Acids Res. 26, 5109-6115 

26. Shao, Y. p Xu, M. Q., and Paulus, H. (1996) Biochemistry 34, 10844-10850 

27. Clarke, N. (1994) Proc. Natl Acad. Set U. S. A. 91, 11084-11088 

28. Perier, F. B., and Adam, E. (2000) Curr. Opin. Biotechnol. 11, 377-383 

29. Farazi, T. A., Wakaman, G., and Gordon, J. I. (2001) Biochemistry 40, 

6335-6343 

30. Liu, X Q. (2000) Annu. Rev. Genet. 34, 61-76 

31. Mills, K. V., Manning, J. S., Garcia, A M., and Wuerdeman, L. A. (2004) 

J. Biol. Chem. 279, 20685-20691 

32. Pietrokovski, S. (1998) Protein Sci. 7, 64-71 

33. Hall, T. M., Porter, J. A., Young, K. E., Koonin, E. V., Beachy, P. A., and Leahy, 

D. J. (1997) Cell 91, 85-97 

34. Dalgaard, J. Z., Moser, M. J., Hughey, R, and Mian, I. S. (1997) J. Comput. 

Biol. 4, 193-214 

35. Belfort, M., and Roberts, R (1997) Nucleic Acids Res. 25, 3379-3388 

36. BurgUn, T. R (1996) Curr. Bid. 6, 1047-1050 

37. Mehlman, T., Benjamin, M., Merhav, D., Oaman, F., Ben-Asouli, Y., Gold- 

shleger, R, Karlish, S., and Shainskaya, A. (2002) Proceedings of the 50th 
Conference of American Society for Mass Spectrometry, Orlando, June 2-6, 
2002, American Society for Mass Spectrometry, Santa Fe, NM 



SUPPLEMENTAL DATA 



Table S-I: N-terminal sequencing results of BIL domains precursor and cleavage products. 



Clone 


Protein product 


Calculated seq 


Observed seq 


BIL1-RSP 


B-C 


CFTPGT 


XFTPGT 


RCA- 1522 


M-B-C 


MKTEEG 


MKTEEG 


BIL2-RSP(-flanks) 


M-B-C 


MKTEEG 


MN a WEEG 


BlL2-RSP(+flanks) 


B-C 


RKGPKM 


RKGPKM 



A weak signal of Lys was also observed in this position. 



Table S-II: Taxonomical distribution of BIL domains 



Species and strain 


Taxonomic group 


No. of BIL domains 
and tvpe 


Rhodobacter capsulaius SB 1 003 


a proteobacteria 


14 B a 


Rhodobacter sphaeroides 2.4.1 


a proteobacteria 


2 B a 


Silicibacter pomeroyi DSS-3 


a proteobacteria 


16B a 


Brucella melitensis 1 6M 


a proteobacteria 


1 B 


Magnetospirillum magnetotacticum MS-1 


a proteobacteria 


1A, 5B a 


Methylobacterium extorquens AMI 


a proteobacteria 


1 B a 


Rhizobium leguminosarum bv. viciae 3841 


a proteobacteria 


1 B 


Neisseria meningitidis Z249 1 


P proteobacteria 


1 A 


Neisseria meningitidis MC58 


p proteobacteria 


3 A 


Neisseria meningitidis FAM 1 8 


P proteobacteria 


6A a 


Neisseria gonorrhoeae FA 1 090 


P proteobacteria 


6 A 


Chromobacterium violaceum ATCC 12472 


P proteobacteria 


1 A 


Pseudomonas syringae DC3000 


Y proteobacteria 


1 A a 


Pseudomonas fluorescens PfO- 1 


v oroteobacteria 


1 A a 


Pvpudnmnri(i*i flunrpscpn^ P"fS^R^^/2^ 


v n rote f> ha ctp rip 


1 A a 


h/lnvi n hp im in hnprnni\)tipn PHI 




1 A a 

1 /A 


Myxococcus xanthus DK1622 


o proteobacteria 


3 A 


Leptospira interrogans 56601 


Spirochaetes 


3 A 


Streptomyces coelicolor A3 (2) 


Actinobacteria 


1 A 


Streptomyces avermitilis MA-468 


Actinobacteria 


3 A 


Thermobifida fusca YX 


Actinobacteria 


1 A a 


Clostridium thermocellum ATCC 27405 


Clostridia 


23 A a 


Pirellula species 1 


Planctomycetes 


1 A 


Gemmata obscuriglobus UQM 2246 


Planctomycetes 


2 A a 


Gloeobacter violaceus PCC 742 1 


Cyanobacteria 


7 A 


Verrucomicrobium spinosum DSM 4136 


Verrucomicrobia 


3 A a 


Uncultured bacterium 582 clone ebac080-L028H02 


proteobacteria 


lB a 


Unknown species b 


unknown 


2B a 



Genome is not fully sequenced yet, so the number of BIL domains in this strain could possibly be higher. 

b Sequences from this putative bacteria were DNA contaminants of the Wolbachia species D. melanogaster 
genome. 



29 



Figure SI 



t a c A+C 




Figure SI. Protein splicing products of BIL4-Cth. Coomassie staining of E. coli over expression of M-B- 
C construct containing A-type BIL4-Cth domain. Protein products were eluted from amylose (A), chitin 
(C) or both (A+C) affinity columns, or analyzed in total cell lysate (T) 



Figure S2 



Coomassie Western Blots Cell-free 

AntLM Anti-C Anti-G 

A A+C T A C T A C 




Anti-C Anti-G 

-ATP +ATP -ATP +ATP 




Figure S2. Experimental analysis of 1522-Rca. Protein products from in vivo growth were eluted from 
either amylose (A) or chitin (C) or both (A+C) affinity columns, or analyzed in total cell lysate (T). 
Proteins were separated on SDS-PAGE and either stained with Coomassie or detected by anti-M, anti-C or 
anti GroEL (Anti-G) antibodies. Proteins translated in vitro in a cell-free system were labeled with S 35 Met 
and their autoradiogram is shown. 



30 



♦ 

» 



Figure S3 



BIL4-Cth (136aa) : 

CFVAGTMILT ATGLVAIENI KAGDKVIATN PETFEVAEKT VLETYVRETT 
ELLHLTIGGE VIKTTFDHPF YVKDVGFVEA GKLQVGDKLL DSRGNVLWE 
EKKLEIADKP VKVTNFKVDD FHTYHVGDNE VLVHNA 

BILl-Rsp (142aa) : 

CFTPGTLIAT VRGEVAVEAL AAGDRIVTRD NGLQPLRWIS RRRLDHATLA 
AFPHLKPVLI EKGSLGPDLP DRDMMVS PNH RILVSRDRTA LHFDAPEVLV 
AAKHLVGPRG IREVECSGTT YLHLMFDRHE WLANGAWTE SF 

BIL2-Rsp (134aa) : 

S LTAGT PVTiT LAGIRPAEGI RPGDRLVARS GAVAVLAAEM TTLPQTEMVA 
IGASTIiAHGQ PDETLLVPAD QPLLLRGARA ELLYGQSPW LPARRLVDGQ 
LTRLLPMEDV DLVTLTFAAP 
AAIYASELHP VTR 

1522-Rca (142aa) : 

CFTPGTLIA TPKGERLVEEL REGDKILTRD NGIQEIRWIG RTDLTRAQLM 
ATPHLKPVLI RAGSLGNGLP ERDMLVS PNH RMLVANE RT A LYFEEHEVLV 
AAKHLIDNRG VKPVETLGTS YIHFMFDRHE WLGNGAWTE SF 



Figure S3. Protein sequence of the analyzed BIL domains. Conserved sequence motifs as described in (1) 
are marked in bold, including the +1 residue. 



31 



Molecular Microbiology (2003) 47(1), 61-73 



Distribution and function of new bacterial intein-like 
protein domains 



Gil Amital, 1 Olga Belenkiy, 1 Bareket Dassa, 1 
Alia Shainskaya 2 and Shmuel Pietrokovski 1 * 

1 Molecular Genetics Department and 2 Mass 
Spectrometry Unit, The Weizmann Institute of Science, 
Rehovot 76100, Israel. 

Summary 

Hint protein domains appear In Intelns and In the C- 
termlnal region of Hedgehog and Hedgehog-like ani- 
mal developmental proteins. Intein Hint domains 
are responsible and sufficient for protein-splicing of 
their host-protein flanks. In Hedgehog proteins the 
Hint domain autocatalyses its cleavage from the N- 
terminal domain of the Hedgehog protein by attaching 
a cholesterol molecule to it We identified two new 
types of Hint domains. Both types have active site 
sequence features of Hint domains but also possess 
distinguishing sequence features. The new domains 
appear in more than 50 different proteins from diverse 
bacteria, including pathogenic species of humans 
and plants, such as Neisseria meningitidis and 
Pseudomonas syrlngae. These new domains are 
termed bacterial intein-like (BIL) domains. Bacterial 
Intein-like domains are present in variable protein 
regions and are typically flanked by domains that also 
appear In secreted proteins such as filamentous 
haemaggtutinln and calcium binding RTX repeats. 
Phylogenetic and genomic analysis of BIL sequences 
suggests that they were positively selected for in dif- 
ferent lineages. We cloned two BIL domains of differ- 
ent types and showed them to be active. One of the 
domains efficiently cleaved itself from its C-terminal 
flank and could also protein-splice Its two flanks, in 
5. coil and In a cell free system. We discuss several 
possible biological roles for BIL domains including 
mlcroevolution and post translational modification for 
generating protein variability. 

Introduction 

Hint protein domains appear in two different protein 

Accepted 19 September, 2002. 'For correspondence. E-mail pietro© 
btanfo.weizmann.ac.U; Tel. (+972) 8 934 2747; Fax (+972) 8 934 
4108. 

© 2003 Blackweil Publishing Ltd 



families: inteins and Hogs. In both families the domain 
performs similar biochemical reactions but in different bio- 
logical processes. Inteins are selfish genetic elements. 
They are inserted in frame in various protein coding genes 
of diverse prokaryotes and few eukaryotes. The whole 
intein element codes for a protein that is translated with 
the intein host protein and excises itself from it in a pro- 
tein-splicing reaction. All inteins have a Hint domain that 
is responsible and sufficient for the protein-splicing reac- 
tion (Paulus, 2000) (Fig. 1). Most inteins also have a 
homing-endonuclease and DNA-binding domains. These 
domains mediate the copying of the intein gene into 
unoccupied intein insertion points (homing). Hog is a C- 
terminal protein region found in the Hedgehog develop- 
mental proteins of vertebrates and insects and in three 
other nematode protein families (Aspock et at. t 1999). The 
Hint domain is the N-terminal part of Hog regions and in 
Hedgehog proteins it is followed by a cholesterol-binding 
domain. In Hedgehog proteins the Hog region mediates 
its cleavage from the N-terminal domain of the protein 
(Hedge) by attaching a cholesterol molecule to the Hedge 
domain (Fig. 1 ). This activates the Hedge domain that is 
then secreted from the cell (Porter era/., 1996a, b). 

The first biochemical reactions of intein excision from 
its protein host and Hog detachment from its N-terminal 
domain are identical (Paulus, 2000): the peptide bond N- 
terminal to the end of the Hint domain is converted into a 
thioester (or ester) bond. A f/ans-esterification reaction 
then attaches the C-terminal flank of the intein, or a cho- 
lesterol molecule in Hog proteins, to the N-terminal 
domain cleaving it from the Hint domain. Hint domains in 
both families also have the same structure fold and 
sequence motifs (Dalgaard etaL, 1997; Hall era/., 1997; 
Pletrokovski, 1998). Interestingly, the phylogenetic distri- 
butions of inteins and Hog proteins, known so far, do not 
overlap. Inteins are found in prokaryotes, single cell 
eukaryotes, plastids and viruses (Pietrokovski, 2001) and 
Hog proteins are found in multicellular animals (Aspock 
era/., 1999). The divergence of the Hog protein Hint 
domains from intein Hint domains is estimated to have 
occurred at, or prior to, the appearance of metazoa 
(Pietrokovski, 2001). 

Hint domains are necessary for the maturation of the 
Hedgehog proteins in which they are found, and perhaps 
this is the role of Hint domains also in other Hog proteins 



62 G. Amitai et al. 





Intein protein-splicing activity 





Hedgehog Hint-domain 
cholesterof-tigatfon activity 



Fig. 1. Known and possible activity of Hint 
domains. 

Top. Known protein splicing of inteins and 
cholesterol-ligation dependent N-tenminal 
cleavage of hedgehog Hint domains. 
Bottom. Possible functions of BIL domains. 
Intein, Hedgehog and BIL Hint domains are 
shown as dark grey horseshoes with their 
flanks as ovals. The Hedgehog cholesterol 
binding domain is shown stippled. The proteins 
N-termtnal ends are on their left 



BIL possible activity : 





ligation 





Oterminal cleavage 



protein-splicing 



N-terminal cleavage 



N- and Oterminal 
cleavages 



(Porter era/., 1996a). Inteins, the progenitors of Hog pro- 
teins Hint domains, are selfish genetic elements based on 
all current evidence (Pietrokovski, 2001). This raises the 
issue whether the origin of inteins themselves is from 
proteins that had a different biological role, and what is 
the nature of that role. Specifically, protein-splicing could 
modulate the molecular function of host proteins although 
no intein is known to do so. 

Here we report the identification of various bacterial 
proteins that include two new types of Hint domains. The 
phylogenetic and genomic distributions of these domains 
are analysed. We show that two of these domains, one 
from each type, are active. One of the domains can pro- 
tein splice and we suggest that the role of BILs is to 
process proteins. In at least some species, including 
pathogenic bacteria, this processing could increase the 
variability of secreted proteins. This might be a new mech- 
anism for generating protein variability. 



Results 

New Hint-like protein domains in bacteria 

Database searches for protein sequences with Hint 
domains identified more than 50 such open reading 
frames (ORFs) in diverse bacterial species, Table 1. 
These Hint domains are termed BILs for bacterial intein- 
like domains. Bacterial intein-like domains are 130-155 
aa long and have characteristic sequence motifs of the 
Hint domain (Pietrokovski, 1994; 1998; Hall era/., 1997) 
(Fig. 2). These domains are distinct from inteins in having 
additional unique sequence motifs, not being integrated in 
highly conserved sites of essential proteins and in not 
including endonuclease domains. They are also distinct 
from Hog protein Hint domains by: (i) lacking conserved 
motifs characteristic to those domains; (ii) occurring in 
bacteria; and (iii) being flanked by protein domains unlike 
those found in all known Hog proteins. Two types of BILs 



O 2003 Blackwell Publishing Ltd, Molecular Microbiology, 47, 61-73 



Bacterial intein-like protein domains 63 



Table 1. Distribution of bacterial fntein-like (B1L) domains. 



Species and strain 


Taxonomic group 


BIL number and type 


Rhodobacter capsutatus SB 1003 


a proteobacterta 


14' B 


Rhodobacter sphaeroides 2.4.1 


a proteobacterta 


2*B 


Brucella melitensis 1 6 M 


a proteobacterta 


1 B 


Magrtetospirilium magnetotacticum MS-1 


a proteobacterta 


1* A 5' B 


Neisseria meningitidis Z2491 


P proteobacterta 


1 A 


Neisseria meningitidis MC58 


p proteobacterta 


3 A 


Neisseria meningitidis FAM1 8 


p proteobacterta 


6 A 


Neisseria gonorrhoeae FA1 090 


p proteobacterta 


6 A 


Pseudomonas syringae DC3000 


Y proteobacterta 


I'A 


Mannheimia haemotytica PHL213 


y proteobacterta 


v a 


Streptomyces coelicolor A3(2) 


Actinobacteria 


1 A 


Thermoblfida fuscaYX 


Actinobacteria 


1" A 


Clostridium thermoceJlum ATCC 27405 


BadHus/Clostridium group 


10* A 



a. Genome is not fully sequenced yet, so the number of BILs in this strain could possibly be higher. 



(termed A and B) are apparent by specific sequence 
motifs and by features of motifs common to both BILs 
(Figs 2 and 3 and see also Supplementary material 
Fig.SI). 

BILs are found in a, p and y proteobacteria (Gram- 
negative bacteria), in actinobacteria (high GC Gram- 
positive bacteria) and in the Bacillus/Clostridium group 
(low GC Gram-positive bacteria). Both their presence and 
genomic distribution are variable, even in closely related 
species and strains. For example, in three complete and 
almost complete sequenced strains of Neisseria meningi- 
tis there are one, three or six ORFs with BILs, in two 
different Rhodobacter species there are two and 14 ORFs 
with BILs, and whereas one strain of Pseudomonas syrin- 
gae has one such ORF, Pseudomonas aeruginosa and 
Pseudomonas putida have none. Different BIL types and 
inteins co-exist in certain species, i.e. MagnetospirUlum 
magnetotacticum has both A- and B-type BILs and Ther- 
moblfida fusca has both BILs and inteins, Table 1. 

The variability in number of ORFs with BILs in different 
species is probably due to gene duplications. Den- 
dograms of BIL domains show that all those derived from 
Neisseria species cluster together and BILs from different 
species subcluster as well. This implies that all Neisseria 
BILs duplicated from one ancestor and some are paral- 
ogies within different species (Fig. 3). This is corrobo- 
rated by the apparent duplication of some gene loci with 
BIL ORFs in these species (not shown). Clustering of BILs 
from the same species is also found in Clostridium ther- 
mocellum and MagnetospirUlum. 



translation initiation signals were found. In addition, a few 
ORFs in C. thermocellum and different N. meningitidis 
strains are truncated, missing N-terminal parts of the BIL 
and its N-terminal flank (see Supplementary material 
Fig.SI). 

Several BILs are flanked by domains that are present 
in secreted bacterial proteins. In P. syringae and Mannhe- 
imia haemolytica BILs are found in FhaB-like ORFs, near 
their C-termini. FhaB is an extremely large Bordetella 
gene, coding for a protein of a few thousand amino acids 
that is a secreted filamentous haemagglutinin. It functions 
as an adhesin and is important for B. pertussis virulence 
(Smith et a/., 2001 ). Three of the Rhodobacter capsutatus 
ORFs with BILs include RTX repeats. These calcium 
binding repeats are found in various secreted bacterial 
proteins, including many toxins (Coote, 1992). In N. men- 
ingitis and N. gonorrhoeae some BILs are found in MafB 
proteins. These are part of multiple adhesin family possi- 
bly involved in glycolipid adhesion to cells ( Paruchuri 
et aL, 1 990; Naumann et a/., 1 999). Three other Neisseria! 
BILs have an HNH nuclease domain in their C-terminal 
flanks. HNH domains appear in various DNase and endo- 
nuclease proteins including secreted toxins (James era/., 
1996; Belfort and Roberts, 1997). A domain present in the 
C-terminal flank of a BIL in the gram-positive bacterium 
T. fusca is also found in a Salmonella short conserved 
ORF (GenBank accession NP_454902) and in the C- 
terminus of a N. meningitis FhaB/Haemolysin protein 
(gene NMA0688). Both these proteins are from Gram- 
negative bacteria and are likely to be secreted. 



Properties of BIL-domain proteins 

BILs are present in ORFs that can code for a few hundred 
to a few thousand amino acids. Some of the shorter ORFs 
might be non-functional genes because they include in 
frame stop codons and no clear promoter sequences and 



Features of BIL domains 

Identification of Hint domain motifs in both BIL types 
enabled us to locate BIL residues that correspond to the 
intein protein-splicing active site. Generally, these motifs 
are conserved and similar in nature to those appearing in 



© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 47, 61-73 



64 G. Amitai et al. 





Fig. 2. Conserved motifs of Hint protein domains. Each row shows 
conserved motifs from one type of Hint protein domain. Motifs are 
ordered left to right in the N* to C positions along the protein 
sequences. Similar motifs are vertically aligned with each other. 
Unique A-type and B-type BIL motifs are underlined with hatched 
lines. The motifs are shown as sequence logos where the height of 
amino acids are proportional to their conservation in each position. 
Positions of the intein protein-splicing active site residues are marked 
by asterisks. Protein motifs were found and are displayed as previ- 
ously described (Pietrokovski, 1998). The BIL motif sequences and 
the distances between consecutive motifs are listed in supplementary 
Table 1 . Intein and hedgehog Hint domains are those described by 
Aspock (1999) and Pietrokovski (2001). Only intein and hedgehog 
motifs common to Hint domains are shown. 



inteins and Hog proteins Hint domains (Fig. 2). However, 
B-type BILs are missing the C-terminal motif of inteins. 
Their final three conserved C-terminal residues are differ- 
ent from those present in inteins. One possible resem- 
blance to the C-termini of inteins is the C-terminal 
penultimate position of B-type BILs. It is either a cysteine, 
serine or threonine. This is the conservation found in the 
position following the C-end of inteins. The SH/OH groups 
on the side chains of these residues in intein host proteins 
are crucial for ligation of the intein C- and N-flanks in the 
protein-splicing reaction (Xu and Perler, 1996). At present, 
we cannot ascertain what is the exact C-terminal end of 
B-type BILs, and some conserved positions at that region 
might belong to the BILs C-terminal flank. In A-type BILs, 
the two C-terminai residues are invariant histidine- 
asparagine, identical to the typical C-terminal residues of 
inteins. However, the following position in A-type BILs, 
which corresponds to the first residue of inteins C-terminal 
flanks, is not conserved (Fig. 2). 

Almost all A-type BIL domains have apparent functional 
protein-splicing active sites, and a few also have flanking 
C-terminal residues found in inteins (serine and threo- 
nine). In order to verify protein-splicing activity we cloned 
and expressed an A-type BIL domain from R syringae. 
This domain has typical intein residues in all active site 
residues (Fig. 4). To examine what is the activity of B-type 
BIL domains we also cloned one such domain from ft 
sphaeroides. 

Experimental analysis of A-type and B-type BIL domains 

To experimentally test BIL domain activity we cloned an 
A-type and a B-type BIL domains each between two tag 
domains. We then examined the size and nature of the 
resulting proteins in an in vitro translation system and after 
expression in E. colL 

A gene encoding a chimeric protein, MBC, composed 
of the maltose-binding protein (M 43045.8 Da), R syrin- 
gae (Psy) A-type BIL domain and its downstream threo- 
nine (B 1 6024.2 Da), and chitin binding domain (C 
7201.5 Da) was constructed in an expression plasmid. 



© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 47, 61-73 



Bacterial intein-Uke protein domains 65 




MafB2 Ngo 
MafB1 Ngo 
BIL5 Ngo 
BIL3 Ngo 
B0369 Nme-B 
BIL3 Nme-C 
BIL6 Ngo 
BIL2 Ngo 
BIL4 Nme-C 
B0655 Nme-B 
BIL5 Nme-C 
B0372 Nme-B 
A2115Nme-A 
BIL6 Nme-C 
MafB1 Nme-C 
BIL2 Nme-C 
FhaB Mha 
BIL8 Cth 
BIL9 Cth 
BIL2 Cth 
BIL3 Cth 
BIL4 Cth 
BIL5 Cth 
BIL6 Cth 
BIL1 Cth 
39_9 Tfu 
3875_87 Mma 
FhaB Psy 
SCP1.201 Sco 
BIL4 Mma 
BIL3 Mma 
BIL2 Mma 
00126 Rca 
110519 Bme 
00746 Rca 
BIL2 Rsp 
03530 Rca 
01374 Rca 

01522 Rca 
4825 Rsp 
00949 Rca 
01 524 Rca 
00588 Rca 
02710 Rca 
01216 Rca 

01523 Rca 
00460 Rca 
00199 Rca 
00459 Rca 



> 
■ 

"O 
CD 



i 

CD 



Fig. 3. BIL domains dendogram. Dendogram was computed from a 
ONA multiple sequence alignment of 49 mostly complete BIL domains 
(Table 1 ), aligned across 201 positions, coding for 67 amino acids 
that could be confidently aligned across the A-type and B-type 
sequences (Fig. 2). Nodes with bootstrap values below 440/1000 
were collapsed. Bootstrap values above 800/1000 are shown. Boot- 
strap values of the nodes grouping all A-type and B-type BILs are 
441 and 519, respectively and the D. metanogaster Hedgehog Hint 
domain (Porter et a/., 1 996a) was used as an outgroup to root the 
tree. The dendogram was calculated by the dnadist program (version 
3.5) of the phyup package(Felsenstein, 1989). Similar results were 
found by the clustalw program (Thompson era/., 1994) and from 
the protein multiple sequence alignment by the phyup protdist and 
by the clustalw programs. Species are named as follows: Nme-A 
Neisseria meningitidis strain Z2491 , Nme-B Neisseria meningitidis 
strain MC58, Nme-C Neisseria meningitidis strain FAM18, Ngo Neis- 
seria gonorrhoeae strain FA 1090, Mha Mannheimia haemolyttca, Tfu 
Thermobifida fusca, Mma Magnetospiritium magnetotacticum, Psy 
Pseudomonas syringae, Sco Streptomyces coelicolor strain A3(2), 
Cth Clostridium thermocellum, Rca Rhodobacter capsulatus, Rsp 
Rhodobacter sphaeroldes, Bme Brucella melitensis. 



Protein splicing of the MBC precursor would produce MC 
and B proteins, C-terminal BIL cleavage would produce 
MB and C proteins, and N-terminal BIL cleavage would 
produce M and BC proteins. The plasmid was expressed 
by an in vitro transcriptionAranslation system. Five distinct 



protein products were identified (Fig. 5A). Three protein 
bands had weights corresponding to the MBC precursor, 
MC the protein splicing product and MB the C-terminal 
cleavage product. The two additional bands had similar 
weights of 43 and 45 kDa. Control reactions with a tem- 
plate of an MC protein (a plasmid without the BIL insert) 
yielded two protein bands. One corresponded in weight to 
the MC protein and the other to the maltose-binding pro- 
tein (43 kDa). Such a product can be seen in chimeric 
proteins having the M protein as an N-terminal tag (i.e. 
NEB instruction manual 4 pMAL protein fusion and purifi- 
cation system', Catalogue #E8000S). 

The 43 and 45 kDa weight bands identified from the Psy 
MBC gene are thus considered as a premature transcrip- 
tion or translation stops side product, not related to the 
BIL domain activity. Appearance of the 45 kDa band, not 
seen in the control reaction and slightly larger than the 
expected weight of M, maybe the result of an additional 
termination point introduced in the BIL domain. Radioac- 
tive methionine was used to label the reaction products. 
Unlike the M and B domains, the C domain has no 
methionines and therefor its product without M or B 
domains could not be labelled this way. 

The approximate relative amounts of the three products 
considered specific to complete translation of the BIL con- 
taining protein were MBC 1 5%, MB 57% and MC 28% 
(Fig. 5A). 

The B-type R. sphaeroides (Rsp) BIL2 domain 
(supplementary Table S1 ) was similarly cloned in between 
the M and C domains but with its 32 aa and 11 
aa N- and C-terminal flanks respectively. Both M and C 
domains now had additional flanks and the sizes of the 
three domains of the Rsp MBC protein were: M 
46109.48 Da, B 13895.03 Da and C 8061.52 Da. Follow- 
ing in vitro translation the products included bands with 



© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 47, 61-73 



66 G. Amitai et a!. 

CFAAGTMVST PDG ERAIDTL KVGDIVW SKP EGGGKPFAAA ILATHIRTDQ P I YRLKLKGK 6046 

QENGQAEDES LLVTPGHPFY VPAQHG FVPV IDLKPGDRLO SLADGAS ENT SSEVESLELY 6106 

LPVG KTYNLT VDVGHTFYV G KLKTWVHNT 6135 

Fig. 4. R syringae FhaB BIL domain sequence. Protein sequence of the Pseudomonas syringae FhaB BIL domain. Regions corresponding to 
the six protein-splicing motifs (Pietrokovski, 1998) are underlined with active site residues double underlined. Co-ordinates show the domain 
position in the FhaB protein. The length of the protein is estimated to be 6274 amino acids long, with the exact position of the N -terminal end 
uncertain. 



sizes corresponding to the MBC precursor (65 kDa), to 
the M domain (43 kDa), that also appeared in the control 
reaction, and bands with sizes corresponding to MB 
(61.5 kDa) and MC (56 kDa) proteins of this construct 
(Fig. 5B). 

To better quantify the Psy BIL reaction products and 
examine it in an in vivo system, the Psy MBC protein was 
over expressed in E. coli and affinity-purified by either of 
its two protein tags. The purified proteins again included 
bands corresponding to spliced- (MC) and C-terminat 
cleaved products (MB) by both predicted mass and 
Western-blot analysis (Fig. 6A). The main product was 
again the MB protein, as displayed by comparing its 
amount to that of the MC protein when both were purified 
on amylose beads (Fig. 6A, left panel lane 3). 

Identities of the MC and MB Psy BIL protein pro- 
ducts were confirmed by mass spectrometry analysis 
(Fig. 6B). The measured mass for the MC protein 
(50 602.07 Da) is in close agreement to the expected 



mass for an unmodified protein (50 266.39 Da). The MB 
protein measured and expected masses are also in close 
agreement: 59 332.79 and 59070.11 Da respectively. A 
prominent peak with a mass of 43 303 Da, also observed 
in MALDI spectra of electroeluted 50 kDa MC fragment 
is probably a cross-contamination by traces from the lower 
mass band observed on the gel (Fig. 6A). Such cross- 
contamination can be observed in gel purified protein 
bands (A. Shainskaya, unpublished). Reactivity of the 
tower mass band with antibody against the M-tag (see 
Fig. 6A), indicates that this -43 kDa band is a truncated 
product. 

Peptide mass mapping of the MC and MB Psy BIL 
reaction products by MALDI analysis further validated 
their assignment. In particular, it identified the splice junc- 
tion of the MC protein and the C-terminal end of the MB 
protein. A peptide corresponding to the ligated ends of the 
M and C protein tags (NMSEFGSTSR-C') was identified 
in the MC protein with an accuracy of 71 p.p.m. Overlap- 



B 



Expressed 
protein 



MC MBC 



labeled 
markers 



MBC 
fvB 
MC 
M 



\ •/ :-l ' I? 



.... 114.5 
.... 88 

■ _ 50 
48 
33 

— 31.5 
26 

16.5 



MC 



MBC 



Expressed 
protein 



60 



48 




MBC 
tvB 

MC 



M 



Fig. 5. In vitro translation products of a fusion protein including BIL domains. 

A. Translation of a chimeric MBC protein with the R syringae FhaB BIL domain. 

B. Translation of a chimeric MBC protein with the R. sphaeroides BIL2 BIL domain. 

PAGE separation of in vitro translated and [^SJ-methionine labelled pC2C-PsyBlL (A) and pC2C~RspBlL2 (B) plasmids MBC genes. 
Translation of plasmids pC2C, pBEST/uc and of no DNA served as controls. Molecular weights, in Kda, were estimated by the products of 
the pBEST/uc plasmid and, in A, by unlabeled protein markers run on the same gel, marked with dotted lines. Expressed protein marks our 
identification of the pC2C-PsyBIL and pC2C-RspBIL2 translation products. 



© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 47, 61 -73 



Bacterial intein-like protein domains 67 



A 

Expressed protein: | MBC MBC MC Expressed protein: mbc MC MBC MC 



Purification: ® chain amyiose chitin Antibodies: Anti-M 




— ' — 1 — ' — l * " ' ' ■ " ■ — i - - — i - r — i - i , .... f .... T | 



40000 50000 60000 40000 50000 60000 70000 80000 

m/z m/z 

Fig. 6 In vivo products of a fusion protein including the P. syringae FhaB BIL domain. 

A. left panel, PAGE separation of overexpressed pC2C-PsyBIL plasmid MBC protein product. The second and third lanes are proteins purified 
on chitin and amyiose affinity columns respectively. The fourth lane Is a control showing overexpresston of pC2C plasmid MC protein product. 
Identity of protein bands from these lanes is shown on the right. Protein markers are shown on the first lane with the masses of protein bands. 
The top band from the 3rd lane and the central band from the 2nd lane, corresponding to the masses of MB and MC, respectively, were excised 
from the gel and analysed by MALD1-TOF mass-spectrometry. Right panel, Western-biot analysis of the overexpressed MBC protein after 
purification on chitin beads. Both chitin-purified samples were run on duplicate lanes and blotted to a single nitrocellulose membrane. The 
membrane was cut in half and each sample duplicate reacted with either antl-MBP or anti-CBD antibodies. Both antibodies against MBP and 
CBD tags reacted with the protein band corresponding to the mass of the MC product. Protein bands corresponding to MB and M products, thai 
appear after purification on chitin beads probably result non-specific binding by excess amounts of overexpressed protein. 

B. MALOt mass spectra of the protein products MC, left, and MB, right, electroeluted from Coomassie stained SDS-PAGE gels. 



ping peptides corresponding to the C-terminal end of the 
Psy BIL domain (N / -LKTWVHN-C / and N'-TWVHN-C) 
were identified in the MB protein with accuracy of 27 and 
100 p.p.m., respectively. Additional peptides from both M 
and C protein tags have been identified in agreement with 
the assignment of the MC and MB products from the SDS- 
PAGE, Western blot and MALDI-TOF analyses (see Sup- 
plementary material Figs S1 and S2). 

© 2003 BJackwetl Publishing Ltd, Molecular Microbiology, 47. 61-73 



Discussion 

BILs are new and distinct Hint domains 

We have identified in three distant groups of bacteria 
(proteobacteria, actinobacteria and the Baciilus/Clostrid- 
ium group) a protein domain that is homologous to Hint 
domains. This new domain appears in non-conserved 
regions of hypervariable proteins. Thus, these domains 



68 G. Amitai et al 



appear distinct from the Hint domains of inteins and Hog- 
proteins by the species and proteins in which they appear. 
The BIL (bacterial inteirvlike) domains are separated by 
sequence features into A- and B-type domains. 

BIL domains are also distinct from inteins by their 
global sequence features. We examined blast sequence 
searches (Altschul era/., 1997) of BIL domains with BIL 
domains and with intein sequences. BIL domains were 
aligned with other BIL domains with higher scores than 
with intein sequences and their alignments with each 
other was across their whole, or almost whole, lengths 
(results not shown). This is also a practical way to distin- 
guish these two related domains from each other. 



Sequence to function analysis of BIL domains 

The protein-splicing active site residues are present and 
conserved in BIL domains with one exception. The C- 
terminal ends of both BIL types each differ from that of 
inteins. In A-type BIL domains, an absolutely conserved 
histidine-asparagine motif is present at the C-terminal 
end, identical to the typical C-terminal positions of inteins. 
However, the C-terminal flanking position of this motif is 
not well conserved. Whereas all inteins have only cys- 
teine, serine or threonine in this position, just a few A-type 
BIL domains have serine or threonine in that position. 
Other A-type BIL domains have aspartate, glutamate, 
asparagine, tyrosine or alanine residues in that position, 
which are not found in any intein (Fig. 2). B-type BILs C- 
terminal ends have a conserved position of cysteine, 
serine or threonine. This could correspond to the C- 
terminal flanking position of inteins but it is not preceded 
by the histidine-asparagine motif typically found in inteins 
(Fig. 2). 

Conservation of the A-type BIL C-terminal end suggests 
cyclization of the C-terminal asparagine residue in the 
same manner as in inteins. We proved that in an A-type 
BIL the peptide bond between this asparagine and the 
following threonine is cleaved. In other A-type BILs that 
do not have C-terminal flanking threonine or serine resi- 
dues, asparagine cyclization might occur without trans- 
esterification by the flanking residue. It is also possible 
that f/a/?s-esterification occurs by mild nucleophilic resi- 
dues found in this position. In the first case the BIL would 
be cleaved from its C-terminal flank and in the second 
protein-splicing will occur. 

B-type BIL domains do not have any conserved aspar- 
agine or glutamine residue at their C-terminal end. Cleav- 
age of this end could then proceed by a mechanism 
different from the asparagine and glutamine cyclizations 
of inteins (Paulus, 2000). Alternatively, B-type BIL 
domains might not be cleaved at their C-terminal ends, 
similarly to Hedgehog Hint domains. 



Activity of BIL domains 

The R syringae BIL domain that we cloned was active 
both in a cell free system and in vivo. In both systems the 
BIL activity mainly resulted in a single cleavage at the 
BIL C-terminal end. Lesser amounts of a protein-splicing 
product were also produced. It is possible that N-terminal 
cleavage also occurred as M domain product was 
detected in the two systems. However, in both cases no 
complementing BC product was identified. If N-terminal 
cleavage occurred than its BC product was unstable in 
both systems. We believe this is not very likely and con- 
sider the M domain products to be the result of immature 
transcription and/or translation products or of protein 
degradation. Our results clearly demonstrate that a BIL 
domain can protein-splice and cleave its C-terminal end. 
Both reactions are probably autocatalytic as they readily 
occurred in a cell free system. 

Pseudomonas syringae BIL domain has both sequence 
features and splicing function of inteins. This indicates the 
natural molecular function of at least certain BIL domains. 
Function of other A-type BILs, less similar to inteins, and 
of B-type BILs might be different, as discussed above. We 
examined the domain function in a protein context differ- 
ent from its natural setting. Inteins are generally believed 
to have native activity at these conditions. Further support 
for the relevancy of our results to the natural BIL activity 
comes from finding the same activity in both in vivo and 
in vitro expression systems. Nevertheless, more experi- 
ments in different contexts (e.g. natural flanks and natural 
species) are needed to verify the *wild type' molecular 
function of various BILs. 

The R. sphaeroides BIL domain we cloned was active 
in a cell free system. Our preliminary evidence indicates 
C-terminal cleavage and protein splicing showing this 
domain is active. Further experiments are needed to bet- 
ter characterize the activity of this domain. Because B- 
type BIL domains have characteristic conserved features 
this may indicate most would have some activity. 

Activity of both tested BIL domains in the cell free sys- 
tem also supports our claim for post translational modifi- 
cations versus the alternative for RNA splicing. The later 
possibility has not been rigorously disproven but is 
unlikely for several additional reasons. First, no sequence 
features of any type of intron is found in the genes we 
studied. The splicing junction and cleavage point of the 
Psy BIL domain are also exactly those predicted from the 
sequence similarity of the BIL and intein domains (see 
Supplementary material Figs S1 and S2). Appearance of 
both splicing and cleavage products is also often found in 
studies of inteins (Paulus, 2000). 

Possible biological functions of BIL domains 

Sequence similarity between BILs from the same species 

© 2003 Blackwell Publishing Ud, Molecular Microbiology, 47, 61-73 



(Fig. 3) and the presence of BIL gene clusters indicate 
their expansion, and thus positive selection, within some 
species. We propose that this selection is for the BILs 
function and that they do not serve as mere static 'spacer* 
domains. Our demonstration of protein-splicing and cleav- 
age activity of the R syhngae BIL domain implies the 
presence of these activities in other BILs. This stems from 
the fact that residues forming the protein-splicing active 
site in inteins are also conserved on most other BILs. 

BILs are present in several hypervariable bacterial pro- 
teins, such as FhaB adhesins and MafB Neisseria! pro- 
teins. Their immediate flanks are the most variable 
portions of the proteins and they themselves are not 
always present in these proteins, even in closely related 
strains of the same species. Some, and perhaps all, pro- 
teins with BIL domains seem to be secreted proteins. BIL 
domains might enhance the variability of secreted pro- 
teins by their protein-splicing and cleavage activity as 
detailed below. 

Several, non-exclusive, ways by which BILs function 
can influence their host proteins are suggested here. BIL 
activity could be modulated by some external signal. Thus 
the host protein can be in two states, with and without 
the BIL domain. These signals might be conformational 
changes of the host protein or BIL domain (allosteric mod- 
ification) or change of redox environment in the protein 
surrounding. Function of BIL domains might not be limited 
to protein-splicing. They may autocatalyse their cleavage 
from their host by either N- or C-terminal auto cleavage 
(Fig. 1).The N-terminal ends of BIL domains, and of Hint 
domains of inteins and Hog proteins are very similar 
(Fig. 2). Thus all these domains probably form labile ester 
bonds on their N-terminal ends. In proteins with BIL 
domains these ester bonds could be attacked by various 
nucleophilic molecules, that might include peptides, pro- 
teins and small reactive compounds (e.g. glutathione, cys- 
teine). Such reactions would ligate the nucleophiles to a 
C-terminal position of the host protein and release the BIL 
and the host protein region downstream to it. This is 
analogous to Hedgehog protein maturation where the Hint 
domain mediates the attachment of a cholesterol mole- 
cute to the cleaved Hedge domain (Fig. 1). In adhesins 
with BIL domains this putative ligation might serve to 
covalently attach the bacteria to its adhesion target. Addi- 
tionally, released BIL and C-terminal domains could have 
a function of their own. For example, in pathogenic bac- 
teria that have such proteins, the released domains could 
serve as decoys to the immune system. 

In Neisseria strains BILs appear either as short ORFs 
downstream of MafB genes and in the C-terminal ends of 
these proteins upstream of a variable domain. This sug- 
gests that, at least in Neisseria, BILs function as cassettes 
that can be fused to genes by genetic rearrangement 
events that may promote the variability of the encoded 

© 2003 Black well Publishing Ltd, Molecular Microbiology, 47, 61 -73 



Bacterial intein-like protein domains 69 

proteins. Other microevolutionary processes in Neisseria 
and Rafstonia sotanacearum, a plant pathogen bacterium 
with a wide host range, are known to generate different 
C-terminal ends for surface-exposed and virulence pro- 
teins (Parkhill era/., 2000; Salanoubat et a/., 2002). 

Not all species with BILs are pathogens and many 
pathogenic bacteria with fully sequenced genomes do not 
have BILs. BILs might be used in processes, not con- 
nected with pathogenicity. For example, BIL activity might 
be one way for bacteria to attach to diverse surfaces. 

Conclusions 

The bacterial inte in-like (BIL) domains we identified 
appear to have the protein-splicing activity of inteins but 
we believe their activity serves a different purpose. 
Whereas inteins are selfish genetic elements we propose 
that BIL domains contribute to the functionality of the 
protein in which they reside by protein-splicing and/or 
autoproteo lysis of their host proteins. Our conclusions are 
based on the types of proteins in which BIL domains 
reside, the genomic and phylogenetic distribution of BIL 
domains, and the protein-splicing and autoproteo lytic 
activity of a BIL domain. 

Experimental procedures 

Data sources 

Sequence data was obtained from the NCBI non- 
redundant sequence databases for Brucella metitensis 16 
M, Streptomyces coelicolor A3(2), Neisseria meningitidis 
2249\ and Neisseria meningitidis MC58 sequences; from 
the NCBI microbial genome sequences database (http:// 
www.ncbi.nlm.nih.gov/cgi-bin/Entrez/genom_table_cgi) for 
Pseudomonas syringae DC3000 (Fouts et a/., 2002) (source 
of preliminary sequence data from The Institute for Genomic 
Research website at http://www.tigr.org); from Integrated- 
Genomics (http://www.integratedgencOTucs.com) for Rhodo- 
bacter capsulatus SB1003 genome data (Haselkorn et a/., 
2001); from Joint Genome Institute (http://www.jgi.doe.gov) 
for the Rhodobacter sphaeroides 2.4.1 (Mackenzie era/., 
2001), Magnetospirillum magnetotacticum MS-1 , Clostridium 
thermoceltum ATCC 27405 and Thermobifida fusca YX 
genomic sequence data - This data has been provided freely 
by the US DOE Joint Genome Institute for use in this 
publication/correspondence only; from The Sanger Institute 
(http://www.sanger.ac.uk) for the Neisseria meningitidis 
FAM18; from University of Oklahoma, Advanced Center for 
Genome Technology (http://www.genome.ou.edu) for the 
Neisseria gonorrhoeae FA1090 genome sequence (Gen- 
Bank accession number for the completed Neisseria 
gonorrhoeae genome is AE004969); and from Baylor College 
of Medicine Human Genome Sequencing Center (http -J I 
www.hgsc.bcm.tmc.edu) for Mannheimia haemolytica 
PHL213 genomic sequence data. BIL domains were 
named either by host protein name (FhaB and MafB), 
arbitrarily or by their integratedgenomics database (http:// 



70 G. Amitai et at. 

ergo.integratedgenomics.com/R_capsulatus.html) numbers 
for R. capsulatus, by their Computational Biology Program 
at ORNL analysis (http://genome.oml.gov/microbiaJ/rsph) 
codes for R. sphaeroides, by their gene number for B. 
melitensis strain 1 6 M and N. meningitidis strains MC58 and 
Z2491 . Available NCBI gene identifier accessions: SCP1 .201 
Sco - 13620683, 110519 Bme -17988864, B0369 Nme-B - 
7225591, B0372 Nme-B - 7225594, B0655 Nme-B - 
7225882, A21 15 Nme-A - 15794988, 00588 Rca - 7469167 
(more information is provided in Supplementary material 
Tables S1 and S2). 

Computational sequence analysis 

Sequence searches used the blast programs for sequence 
to sequences searches (Altschul etaL, 1997) and the blimps 
program for blocks to sequences searches (Henikoff etal., 
1995). Block multiple sequences alignments were con- 
structed using the blockmaker (Henikoff etaL, 1995) and 
macaw (Schuler efa/., 1991) programs as described previ- 
ously (Pietrokovski, 1998). Phylogenetic analysis was done 
using programs from the phylip package (Felsenstein, 1 989) 
version 3.55. 

Functional assay of protein-splicing 

In order to create an assay for protein-splicing activity, a 
plasmid containing the genes for two protein tags was 
assembled. This plasmid is termed pC2C and is based on 
the pMALC2 vector (New-England BioLabs (NEB), Beverly, 
MA). It contains the malE maltose-binding protein (MBP) and 
the cbd gene coding for the chitin-binding domain (CBD) from 
B. circulans. Chitin-binding domain was cloned by PCR from 
the pTYB2 vector (NEB, Beverly, MA) using the primers: 5'- 
AAATGTCGACTGCGGTGGCCTGACC-3' and 5'-TGTCG 
TATTGCTTCCTTTCGGGCTT-3'. The cloned CBD sequence 
included the upstream linker 5'-TGCGGTGGCCTGACCG 
GTCTGAACTCAGGCCTC-3' and was inserted into the 
pMALC2 vector between the Sad and PsA restriction sites. 
The P. syringae (Psy) BIL-domain was amplified by PCR from 
P. syringae DC3000 strain genomic DNA (supplied by Dr 
Sessa G. from the Tel Aviv University, Israel) using the prim- 
ers 5'-AAAAGGATCCTGCTTTGCGGCCGGAACGA-3' and 
5'-AAAATCTAGAGGTATTATGCACCCATGTCTTG-3'. Poly- 
merase chain reaction (PCR) mixtures containing Taq DNA 
polymerase (1 ui), Taq polymerase buffer (Sigma, St Louis. 
Ml), 200 mM dNTP, 10 mM of each primer and 100 ng 
genomic DNA in a 50 uJ reaction. Amplification was carried 
out using a Biometra thermal cycler. BIL-domain was cloned 
in between the SamHI and Xba\ sites downstream from the 
malE gene and upstream from the CBD sequence. This 
pC2C plasmid inserted with the Psy BIL domain is termed 
pC2C-PsyBIL 

The R. sphaeroides (Rsp) BIL2 domain was amplified by 
PCR from R. sphaeroides 2.4.1 strain genomic DNA (sup- 
plied by Dr Steven L. Porter, Department of Biochemistry, 
University of Oxford) using the primers 5'-GAATTCGGTGA 
TTCATCCTTGGGGCGA-3' and 5'-TCTAGAAAAACACGGC 
AAGGGCGAGCGG-3'. BIL-domain was cloned together with 
32 amino acids at the N-terminal and 1 1 amino acids at the 
C-terminal in between the EcoRI and Xba\ sites downstream 



from the malE gene and upstream from the CBD sequence. 
This pC2C plasmid inserted with the Rsp BIL2 domain is 
termed pC2C-RspBIL2. 

Expressed fusion proteins originating from pC2C BIL plas- 
mids were termed MBC for MBP-BIL-CBD. pC2C was used 
as a control plasmid generating the MC fusion protein. 



In vitro protein translation 

In vitro transcription-translation of the proteins MBC and MC 
by using pC2C BIL plasmids and pC2C plasmid, respectively, 
as DNA templates was carried out using E. colt S30 extract 
for circular DNA system (Promega, Madison Wl). Reaction 
was carried out using 0.25 mM [^-methionine and 
220 nmol of plasmid DNA as template following the manufac- 
turer's protocol. Reaction was incubated at 37°C for 90 min 
for the Psy BIL and 120 min for Rsp BIL. Before electrophore- 
sis, 5 \x\ or 1 0 pJ of each protein sample were mixed with four 
volumes of acetone to remove polyethylene glycol from sam- 
ple. Acetone precipitation was followed by centrifugation at 
12 000 g for 5 min. Supernatant was discarded and pellet 
containing the proteins was mixed with protein loading buffer 
to give a final concentration of 0.06 M Tris-CI, 2% SDS, 10% 
v/v glycerol, 0.01% bromophenol blue. Protein were visual- 
ized after 10% or 7.5% SDS-PAGE by using a phosphor 
imaging screen. Signals were then quantified with the NIH 
Image 1 .62 software. Product amounts were from values of 
three independent experiments averaged for each sample 
together with their standard deviation of the means. The 
molar percentage of each product was calculated. 



In vivo expression and purification of Psy BIL 

Competent E. coli cells TB1 (NEB, Beverly, MA), were trans- 
formed with the pC2C-PsyBIL plasmid described above. The 
transformed cells were plated on LB agar supplemented with 
ampicillin (100 \xg ml' 1 ). Single colonies were inoculated into 
3 ml of LB medium with ampicillin (100 ug ml -1 ). After incu- 
bation for 16 h at 37°C with shaking, 1 ml was used to inoc- 
ulate a 2 L flask containing 500 ml of LB/Amp1 00. Incubation 
was continued at 37°C with shaking until the optical density 
(OD) at 600 nm was 0.6. Then, IPTG was added to a final 
concentration of 0.3 mM. After further incubation for 3 h, cells 
were harvested by centrifugation (5000 g, 20 min), resus- 
pended in 20 mM Tris pH 7.4, 200 mM NaCI with a protease 
inhibitor cocktail (Sigma, St Louis, Ml) and lysed by sonica- 
tion. Lysed cells were then centrifuged at 17000 g for 20 min 
to remove cell debris. Supernatant was then used for all 
further analysis. Proteins were then affinity purified with 
either chitin (NEB, Beverly, MA) or amylose beads (NEB, 
Beverly, MA). Etution of protein from beads before electro- 
phoresis was done by mixing the protein bound beads with 
SDS-PAGE sample loading buffer. 



Antibodies 

Western blot analysis was used to identify proteins with either 
MBP (M) or CBD (C) tags. Monoclonal mouse anti-MBP 
(Novus Biologicals, Littleton, CO) were used for identification 



© 2003 Blackwel) Publishing Ltd, Molecular Microbiology, 47, 61-73 



of the M-tag and polyclonal rabbit anti-CBD (NEB, Beverly, 
MA) were used for identification of the C-tag. Secondary 
antibodies used were HRP conjugated goat anti-mouse 
IgG or goat anti-rabbit IgG (Jackson ImmunoResearch 
Laboratories, West Grove, PA). 



SDS-PAGE and protein staining 

The SDS-PAGE was performed as described (Laemmli, 
1970). Protein samples were mixed with protein loading 
buffer to give a final concentration of 0.06 M Tris-CI, 2% SDS, 
10% v/v glycerol, 0.1 M DTT, 0.01% bromophenol blue. All 
samples were boiled for 3 min before the gel run. TriChromo- 
Ranger (Pierce, Rockford, IL) prestained markers were used 
to estimate protein sizes. After electrophoresis, the polyacry- 
lamide gels were fixed in 40% methanol, 7% Acetic acid and 
then stained by PhastGel Blue R stain (Pharmacia Biotech 
AB, Sweden). Gel were destained by 40% methanol, 7% 
acetic acid and then by deionized water, visualized protein 
spots were excised using a scalpel before MALDI-TOF 
analysis. 



Electroelution from the gel 

Electroelution was performed in GeBAflex - tube (Gene Bio 
Application, Israel) at 1 50 V for 2 h. Elution buffer contained 
0.025% SDS, Tris and Tricine, pH 8.5. Sodium dodecyl sul- 
phate (SDS) removal after electroelution has been performed 
by cold TCAiacetone precipitation in the presence of 0.5% 
sodium deoxycholate (NaDOC) (T. Mehlman and A. 
Shainskaya, unpublished). 



In-gel digestion 

Protein bands were excised from the SDS gel stained with 
PhastGel Blue R stain and destained using multiple washing 
with 50% acetonitrile in 50 mM ammonium bicarbonate. Pro- 
tein bands were subsequently reduced, alkylated and in-gel 
digested with either bovine trypsin (sequencing grade, Roche 
Diagnostics, Germany) or cnymotrypsin (Boehringer Man- 
nheim) applied at a concentration of 12.5 ng uJ~ 1 in 50 mM 
ammonium bicarbonate at 37°C as described (Shevchenko 
era/., 1996). An extracted peptide solution was dried for 
subsequent MALDI-MS analysis. 



Mass spectrometry 

Intact molecular mass measurement and peptide mass map- 
ping were performed on a Bruker Reflex III MALDI time-of- 
flight (TOF) mass spectrometer (Bruker, Bremen, Germany) 
equipped with SCOUT source, delayed ion extraction, reflec- 
tor and a 337 nm nitrogen laser. Each mass spectrum was 
generated from accumulated data of 200 laser shots. Both 
external and nearby calibration for proteins were achieved by 
using BSA and myoglobin proteins, obtained from Sigma. For 
peptide mapping, internal calibration with molecular ions of 
regularly occurring matrix ions and peptides derived from 
trypsin was additionally performed to consolidate further pep- 
tide assignment. 



Bacterial intein-Hke protein domains 71 

Intact molecular weight measurements by MALDI MS 

Proteins electroeluted from the gel were further purified by 
cold acetone precipitation. The dried extract from one lane of 
the gel was redissolved in 0.5 ml of 80% formic acid and 
immediately diluted with water to yield a final concentration 
of 20% formic acid. 50% of this solution was applied to a 
target plate. 



Peptide mass mapping by MALDI MS 

Aliquots of one tenth of the extracted peptide mixture volume, 
dissolved in 0.1% TFA or formic acid/isopropanol/water 
(1 : 3 : 2) were used for MALDI-MS using fast evaporation or 
dry droplet methods. The fast evaporation method utilized 
matrix surfaces made of a-cyano-4-hydroxycinnamic acid (4- 
HCCA) (Vorm era/., 1994; Jensen era/., 1996). The dry 
droplet method utilized matrix surfaces made from 2,5-dihy- 
droxybenzoic acid (DHB) (Kussmann era/., 1997). 



Acknowledgements 

We thank G. Sessa for gift of R syringae bacterial strain, S. 
L. Porter for gift of R. sphaeroides bacterial strain and H. 
Engelberg-Kulka, G. Amitai and G. Sessa for commenting on 
the manuscript. Preliminary sequence data was obtained 
from The Joint Genome Institute (http://www.jgi.doe.gov), 
The Institute for Genomic Research (httpV/www. tigr.org), 
The Sanger Institute (http://www.sanger.ac.uk), University 
of Oklahoma, Advanced Center for Genome Technology 
(http://www.genome.ou.edu), IntegratedGenomics (http:// 
www.integratedgenomics.com), and Baylor College of 
Medicine Human Genome Sequencing Center (http:// 
www.hgsc.bcm.tmc.edu). Sequencing of P. syringae DC3000 
at TIGR was accomplished with support from NSF: Plant 
Genome Program. The Gonococcal Genome Sequencing 
Project supported by USPHS/NIH grant #AI38399, and B.A. 
Roe, L. Song, S. P. Lin, X. Yuan, S. Clifton, Tom Ducey, Lisa 
Lewis and D.W. Dyer at the University of Oklahoma. The DNA 
sequence of M. haemolytica PHL213 was supported by grant 
#00-35204-9229 from USDA/NRICGP to S. Highlander and 
G. Weinstock at the BCM-HGSC. S. Pietrokovski holds the 
Ronson and Harris Career Development Chair. 



Supplementary material 

The following material is available from httpy/www. 

blackwellpublishing.com/products/journals/suppmat/mole/ 

mole3283/mmi3283sm.htm 

Fig. S1. Assignment of MALDI Peptide Mass to the MC 
ligation product (Fig. 6A). 

A. Sequences detected by MALDI analysis of the MC product 
are underlined. Twenty-five tryptic peptide masses were 
assigned to the amino acid sequence of the MC protein, 
corresponding to sequence coverage of 49%. Amino acids 
matching the C-tag protein are in italic. The double under- 
lined peptide (ISEFGSTSR-amino acids 388-396) contains 
the BIL splice site between amino acids Ser393 and Thr394. 



© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 47, 61 -73 



72 G. Amitai et al 



B. Measured and calculated masses for tryptic peptides 
which identify the 50.6 kD MC protein. 
Fig. S2. MALDI peptide mapping of the 59.3 kD MB pro- 
tein (Fig. 6A). 

A. Underlined sequences correspond to peptides detected by 
MALDL Uppercase letters match amino acids of the M-tag 
and lowercase letters match those of the BIL domain. Note 
that the C-terminus of the protein, Asn 541 , is the penultimate 
C-terminal residue of the BIL sequence (Fig. 4). 

B. Measured and calculated molecular mmasses of the two 
C-terminal peptides. 

Table SI. BIL sequence motifs. 
Table S2. BIL sequence sources. 



References 

Altschul, S.F., Madden, T.L., Schaffer, A. A., Zhang, J., 
Zhang, Z., Miller, W., and Lipman, D.J. (1997) Gapped 
blast and psi-blast: a new generation of protein database 
search programs. Nucleic Acids Res 25: 3389-3402. 

Aspock, G., Kagoshima, H., Niklaus, G., and Burgiin, T.R. 
(1999) Caenorhabditis elegans has scores of hedgehog- 
related genes: sequence and expression analysis. 
Genome Res 9: 909-923. 

Belfort, M., and Roberts, R.J. (1997) Homing endonucleases: 
keeping the house in order. Nucleic Acids Res 25: 3379- 
3338. 

Coote, J.G. (1992) Structural and functional relationships 
among the RTX toxin determinants of gram-negative bac- 
teria. FEMS Microbiol Rev 8: 137-161 . 

Dalgaard, J.Z., Moser, M.J., Hughey, R., and Mian, I.S. 
(1997) Statistical modeling, phylogenetic analysis and 
structure prediction of a protein splicing domain common 
to inteins and hedgehog proteins. J Comput Biol 4: 193- 
214. 

Felsenstein, J. (1989) PHYLIP - Phytogeny Inference Pack- 
age (Version 3.2). Cladistics 5: 164-166. 

Fouts, D.E., Abramovitch, R.B., Alfano, J.R., Baldo, A.M., 
Buell, C.R., Cartinhour, S., et ai (2002) Genomewide iden- 
tification of Pseudomonas syringae pv. tomato DC3000 
promoters controlled by the HrpL alternative sigma factor. 
Proc Natl Acad Set USA 99: 2275-2280. 

Hall, T.M., Porter, J. A., Young, K.E., Koonin, E.V., Beachy, 
P.A., and Leahy, D.J. (1997) Crystal structure of a hedge- 
hog autoprocessing domain: homology between hedgehog 
and self-splicing proteins. Cell 91 : 85-97. 

Haseikom, R., Lapidus, A., Kogan, Y., Vlcek, C, Paces, J., 
Paces, V., etal. (2001) The Rhodobacter capsulatus 
genome. Photosynthesis Res 70: 43-52. 

Henikoff, S., Henikoff, J.G., Alford, W.J., and Pietrokovski, S. 
(1995) Automated construction and graphical presentation 
of protein blocks from unaligned sequences. Gene 163: 
17-26. 

James, R., Kleanthous, C, and Moore, G.R. (1996) The 
biology of E coiicins: paradigms and paradoxes. Microbiol- 
ogy 142: 1569-1580. 

Jensen, O.N., Podtelejnikov, A., and Mann, M. (1996) 
Delayed extraction improves specificity in database 
searches by matrix-assisted laser desorption/ionization 



peptide maps. Rapid Comms Mass Spectrometry 10: 
1371-1378. 

Kussmann, K., Nordhoff, E., Rahbek-Nielsen, H., Haebel, S. f 
Rossel-Larsen, M., Jakobsen, L., etal. (1997) Matrix- 
assisted laser desorption/ionization mass spectrometry 
sample preparation techniques designed for various pep- 
tide and protein analytes. J Mass Spectrometry 32: 593- 
601. 

Laemmli, U.K. (1970) Cleavage of structural proteins during 
the assembly of the head of bacteriophage T4. Nature 227: 
680-685. 

Mackenzie, C, Choudhary, M., Larimer, F.W., Predki, P.F., 
Stilwagen, S., Armitage, J. P., etal. (2001) The home 
stretch, a first analysis of the nearly completed genome of 
Rhodobacter sphaeroides 2.4.1. Photosynthesis Res 70: 
19-41. 

Naumann, M. f Rudel, T., and Meyer, T.F. (1999) Host cell 
interactions and signalling with Neisseria gonorrhoeae. 
Curr Opin Microbiol 2: 62-70. 

ParkhiU, J., Achtman, M., James, K.D., Bentley, S.D., 
Churcher, C, Klee, S.R., etal. (2000) Complete DNA 
sequence of a serogroup A strain of Neisseria meningitidis 
Z2491 . Nature 404: 502-506. 

Paruchuri, D.K., Seifert, H.S., Ajioka, R.S., Kartsson, K.A., 
and So, M. (1990) Identification and characterization of a 
Neisseria gonorrhoeae gene encoding a glycolipid-binding 
adhesin. Proc Natl Acad Set USA 87: 333-337. 

Paulus, H. (2000) Protein splicing and related forms of 
protein autoprocessing. Annu Rev Biochem 69: 447- 
496. 

Pietrokovski, S. (1994) Conserved sequence features of 
inteins (protein introns) and their use in identifying new 
inteins and related proteins. Protein Set 3: 2340-2350. 

Pietrokovski, S. (1998) Modular organization of inteins and 
C-terminal autocatalytic domains. Protein Sd7: 64-71. 

Pietrokovski, S. (2001) Intein spread and extinction in evolu- 
tion. Trends Genet 17: 465-472. 

Porter, J. A., Ekker, S.C., Park, W.J., von Kessler, D.P., 
Young, K.E., Chen, C.H., et a/.(1996a) Hedgehog pattern- 
ing activity: role of a lipophilic modification mediated by 
the carboxy-terminal autoprocessing domain. Cell 86: 21- 
34. 

Porter, J. A., Young, K.E., and Beachy, P.A. (1996b) Choles- 
terol modification of hedgehog signaling proteins in animal 
development. Science 274: 255-259. 

Saianoubat, M., Genin, S. f Artiguenave, F., Gouzy, J., 
Mangenot, S., Arlat, M., etal. (2002) Genome sequence 
of the plant pathogen Ralstonia solanacearum. Nature 
415: 497-502. 

Schuler, G.D., Altschul, S.F., and Lipman, D.J. (1991) A 
workbench for multiple alignment construction and 
analysis. Proteins: Structure, Function, Genetics 9: 180- 
190. 

Shevchenko. A., Wilm, M., Vorm, O., and Mann. M. (1996) 
Mass spectrometry sequencing of proteins from silver- 
stained polyacryl amide gels. Anal Chem 68: 850-858. 

Smith, A.M., Guzman, C.A., and Walker, M.J. (2001) The 
virulence factors of Bordetella pertussis: a matter of con- 
trol. FEMS Microbiol Rev 25: 309-333. 

Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994) 
clustal w: improving the sensitivity of progressive multiple 



© 2003 Blackwe!) Publishing Ud. Molecular Microbiology. 47, 61-73 



sequence alignment through sequence weighting, position- 
specific gap penalties and weight matrix choice. Nucleic 
Acid Res 22: 4673-4680. 
Vorm, O., Roepstorff, P., and Mann, M. (1994) Improved 
resolution and very high sensitivity in MALDI-TOF of matrix 



Bacterial intein-like protein domains 73 

surfaces made by fast evaporation. Anal Chem 66: 3281- 
3287. 

Xu, M.-Q., and Perter, F.B. (1996) The mechanism of protein 
splicing and its modulation by mutation. EMBO J 15: 51 4&- 
5153. 



© 2003 Blackweil Publishing Ltd, Molecular Microbiology, 47, 61 -73 






*zsz c 




1X.4 I 




II 

9 



<j>jo>: 





A S I i 



< 
3 8. 

m 2* 





* 



*9 




Pi i 
I- CI ? 

r^sjfiE oi 

19 





1 

>mzi 

>4S^B9 






sue i 





1)91 




mi 



~ >CH 




CO 

J s. 

CD 2* 




CD 

a) 5- 

05 "V 
co CQ 

~ "° i_ 
|gi 
. CD 

03 -iZ> ^ 

§< 2 

£ c a 

O 3 O 

Q_ i_" Q- 

-h D (D 

if™ 

O o ^ 
<D CO « 



CD 



£ E 



CD 
CO 

cz 

CO 



</» 

OJ — 

$ CD 

o o 

E E 

2 3 

D_ CD 

. to 

J2 § 

CO o 



o 

CD 
O) 
"O 
CD 



CO 

cz 
O 



CN 



> 

o 



O "O CO 

E 

o o>-«- 

"CO O) 



CD 



CO CD 



CD *3 
CD 

CL 



0 



^ ^ CD 

O "CO 
£ O CD 



CD 



CD 



CD 



£ m CD 

m r r 

CO CO 5 

O 42 CO 

O v— CD 

JC CO o 

CO ~ cz 

s.| § 

O CO o- 

. CD 

.c co co 

O CD , ' 

CO o w 

c CO 

CD C 

"I 



LU 

CO 



CO 

E 
o 



CD 
CO 



CO 



- 2 co 

III 

J O h 

^ ro co" 
u w o 
CO cz c 

*£ Q = 

O ^ 73 
C CO CD 

c o x: 

Ibl 

CD O r- 
co ^ 

O 2 > 
O CD "O 
. SZ CD 



S CD 
CO c 

E -2 

/is 00 

2 =5 

ro CD 

CO _£Z 

CD *- 

3 T3 
"O CZ 

to 

CD CO 
CD 

CD O 

• 4— ' f— 

"(O CD 
CD =3 
> CT 
CD 

o co 

CO ^ 
O) "0 

■5 E 

co" CD 
CZ CD 

it 

Q-00 
cz CJ> 

■— a> 

CD t- 



— 

CD CO 

SZ > 
o 

o o 

CO 

CZ CD 



CO 

5T 

CD 
05 



O 

o 

CL 
CO 

< 

JD 

TZJ 
CD 



O 
CO 
CD 
"D 

CD 
CO 
O 



CD 
CO 
CO 

CO 

E 
o 



CO 

o 

CL 



CD 



O CO 
S CD 
CO "D 

9 >, 



0 ^ 

"S $ 

o) o 

TJ jz 

CD CO 

-C= CD 

E « 

CO CO 

1 l 

£=. o 



.g '= CD 

o c § 

CO 

CD O CD 

sz $z ro 

O) CD .CO 



:= CO 



CO 

05 CD 



CZ CO 

.0 CO 
"co "o 

CD 



CD r— 

CM Sz £ 

. CD . 

U- O CQ 



CD CO 

W 7: u. 

o 5" 

O "D CO 



0) ~ 
CO o 

£ o 

is e 

E 8 



o 
E 



HHa)HOrlrlrlHHO\rlrl^rlHninrinCDrOWI s -HU)(NW( y )h^ 

ln(J^^lno^ln(^lnlnl^^a)lnlna)lf^ln^nfO(N^t^Joou)Wl^l^D^J^^lr)(^) 

rH H H f H HrlH^Hrl^HH^HtN^OHHrl^HN H H 



1-3 J J 





pq pq W W CO P 



8 a 0 
55 53 



tN (N (N (M CN (N rg 



CO 



£ £ £ £ £ & £ fc fe 5 £ & £ £ si g g & 



^^^^^^^^ 

HH hJM ►»■( •*-< t*< l*« 





* s s 




U) ID U) ID ID U) 



hi 3 h 

P FH H H 

I — I I — t I — i i — I 

S 525 13 £ 

P p p p 

1 1 E i 

Of Of Of Of 



OCDCDCDCDCDOCDCDCDCD 
cocococococococococococo 
pqpqpqpqpqwpqpqpqpqpqpq 






pq pq 



pq i*J ^ pq pq W 



s 



CD CD . 
< CO 2 
CO pq CO CO 



pq 



PLJ 



^ w 

CO CO CO CO CO CO 



oooooooooooooooo 



o o o o o 



o o o o o o 



CO CO CO CO CO CO 



o. 



CO 



p 
p 



9 

o ^ 

M H 

3: & 



CO CO CO CO CO CO 

hi J hi hi hi hi 



& 6 



P P P CO CO 
CD CD CD CD O O 



hi 

p 



p3 h? h? 

a a § a 

o a 9 a 

IH H M IH 

£ £ 5: £ 



hi 
P 




CO 

hi 



hi 
p 



Cli 



CO 
hi 



t£ c3 c3 

CO CO CO 
CD CD CD 



hi 
p 



hi 
p 



& s s 



S g 9 9 

c3 ^ ^ 



P P P P 
CD CD CD CD 





WCOCOC0COCOWCOC0COCQCOCOCOC0C0EhP>COJ>£h 
p^p^pup^p^p^pTIpuP^P^P^P^ 



M M IH 



M M M 



cocococococococococococococococo 



PC PC PC PC PC PC PC PC PC PC PC PC DC DC PC PC PC 

hi h hi 

H P CO 

0» O C O P W co 

CO CO CO CO CO Q CD 



p 

H 

a 

CO 
53 



> IH > 

^ddhlgi^d^ 

OOOOfCKpOO 
gCQCOCOCOgCOCQ 
53J3S3SZ53!3J3 



H H H 



Z S3 53 53 2 




CX( CU P-i p 

PC PC PC PC 



pq CD pq p p pq 

pLi h-I Ct< Pn ttl pL. 



MUM 



Pq Pq Pq § 

S 8 8 I 



CN M M CM CM 



tMtNCNCNNMCMNWHnCMCMCMNtMOlCMtNtN CM CM 



CD 

CO 
M— 

4— 1 

o 
E 

CD 
O 

CD 



CD 



CO 

CO 

a> 

03 



CD 
CO 

H 
pq 



PrS 



a 



t3 

CO 

I 

CO 



•H 
rH 

-M rH 

<3 



a) 
a 



O CD 



CD CD 
CO 

pq 



a Si 
p p 

Pd Pd 



<; co 

& i 

CO CO 

pq w 



% % 3 



o o o o 



13 13 13 

CO CO CO 

CD CD CD . 

£ E S tj 

CO CO CO PLi 

»xi <j\ co cn 



CD 



CD 



B B 



CO 

1 

CO 

pq 

I 

M 



OS 

I 

Pt5 



WWW 
CD CD CD 



B S § 

CO CO <»J 

i ^ I 

&j 55 Ec 

p p p 

CD CD CD 

< ^ > 

O O PtJ 



^ W W 
CD CD CD 



§ S S 

CO CO co 

^ 5> 

ttS &S Pd 

p p p 

a a a 

o o» o» 



pq pc P^ PC PC 
5] 5] 5, 53 5c 52 




m pq pq 
CD 

E h h 

pci pq pq 

P pu 



pq W dl pq 

fXl CLl PUi pLl 

Eh Eh Eh Eh 

pq pq pq pq 

fXi di PLi pL( 



M M M M M M 

^ 5 5 5 5 i 



8 8 8 



p p p p 

CD CD CD CD 



hi M rH 



CO 
CO 

M M rH 

ess 



^ ^ ^ 

3 5 3 5 



ooooooooo 



OrHrHOOOOOOO 



Eh Eh Eh |h Eh 
CO 

1 

CO CO CO CO Pij 



CO 
CD 



CO 
CD 



CO 
CD 

E 

CO 



CO 
CD 



j j j j cx S pi 

H p H p Eh H Eh 
CO CO CO CO CD CD CD 

§ S S S s S 

|*T | |*T | pL| pt| PCl pLj pT| 

COCOCOPMCJCJCOCOCO 




pq pq 

PL, 

H H 

pq Pq 
PU Pu 



8 8 
1 1 



3 3 



o o 



8 8 




^o^c^r^cr\cr*t^a%o\r>jr--cNjcr»Lnoc>iHOkr> 
^ ^ (riooooovD^rror-iH 

ro ro CN CTi o CN co ro 

in tH CN 



rH r- m co 

ro CO rH iH 



a) aj 
£ & 



a) a u o u u a 



H 

<U 

c. 



a) 



+ + + 
o> oj in h 
m pq 
ro m id 
o o o 



d) Q) QJ <U 

6 e 6 e 

H -H *H 

(D Q) Q) 

C G C 
I I I I 

■ r ,,J pT) ^ 



H -H 
0) <D 

C3. C 



PQ PQ PQ S 



P O O O O O ffi 
<y Q) tJI Di O- 01 Di Dl E 
& E *H *H «H -H *H -H CTJ 
•HCDQJQJQJtDGJnJ 

, c, I I I I I I I 

I I rH CM rO CM in VX> 
LOIDPQPPPQPPCO, 
P P M-J hH hH M-l IH M | 

in 
r- 
oo 
ro 



£> 4C JC! 4C ^C 4C 

4J 4J XJ 4J 4-» 4-> 4-> 
O O O O O O O 



P P 

PQ PQ PQ PQ PQ S £ 



>i O tn 
cn u P 

fl) M W 

co 4-> a) 

rH PQrHCNJrO^rmVDr- 
^CH I Xi rH IH IH rH IH H HH 

rxiPQcr>pLiPQPQpQPQPQPQPQ 
ro 



XI 43 & 

4J 4J 4-» 
O O O 
rH rH rH 

U U U 
I I I I 

co o> o 

P P rH 
rH H P 
PQ PQ M 
PQ 



rvjr--i-foot s --oo(r>cornrnovi>ovDt^r--rorgrn 

nHHnU1tN(N(MCNOM(Nli)HHH(NJMOJCNJ 

CM 



8 

III 

<; 5 y 5 j 
o o o o o 



Pu J J 

O co CO 

w 
cj 




COC0C0C0COCJC0H 

pq cq pq w o w > 

^ w c H ft 

s § i & i i i 



CO CO 

i I 

•-3 ^ 

o o 



OtHOOOOOOOOOOOOOOO 



o o 




& W Q 
>j pm Lti cm 




oocot-"coooo^i>r-r-r-r-^r-i>r-oo 




5! 






>mmOmmE-»m 

Q E> 2 <! n 

- j i i g « * « 

i s ^ 



3 8 



rHiHrHrH iHiHrHiHrH r-( H H H 



i— I CO 00 CO 




^ H H 



tH ^ «tf iH 



i~H vj ' ^* ^J* *sl^ 




g 3 

E-i co 

O O _ 

3 S 3 I 3 




H H H 




cMCMt^t^cMLnmr^c^cMCMcMCMU>rMr«-cMCMCMCM 



co 

O pq 



i— i 

pa 
a 



< S <5j co 

b b h ^ 

K Pi Pi U 



> CO CO 
CO rf & ^ 

& d 




a 



H M 



o o u 



Pd Cu Cu H 

« rx Pti 



corororororocororocorocnro 




I £ P 1 6 

£ § £ £ 

»! 2 J ^ J 

s 8 8 8 3 

H J J 3 

CO H CO CO 

W H a W 

I> H > H H 

i s b I i si 




QQQQDQQQW 

j ^ a j 




i— i di 




oooooooo 



ooooooooooo 



a a a 




K H |3 

w g CO Q Q 

i g 1 1 E 

Pt| pt| pM pL» Cl, 

u o o u u 



c5 a [? a q q 

I g 

co 




ot^co^roininco^cMinaxi-ioNin^i^ujiniD 
vocorot^^cocNCM^r^coco^roHrococMrHiH 

rH rH H H H CD in rHrHi-( 



nj nj nj 

u o u 

o o o 

xJ jC x: 

»x> a> o> 

C\J CTt LD 

H H T 

o o o 

o o o 



u o 

o o 

x: jc 

u u 
_l I 

O CO 

IX) CO 

*r in 

o o 

o o 



nJ (fl aJ 
o u u 

BBB 
u i M i M i 

kd a> vo 
^* <j* i~t 
r- cm 

O O rH 
O O O 



res rtf 

u u 

o o 

^ cm 

r- cm 

ro in 



m 

U 
O 

x: 
u 



o 
o 

■s 



I I 

ro ^ 

CM CM 

in m 



o o o o 



as «J a a 3 

u u co « m 

O o o o p 

xi xi x: u 

o o in cm t— • 

h n n j j 

r- in co m n 

cm n ^ CQ ffl 

o o 



CM 



t I I 



PQ PQ CQ 



Table S2. BIL sequences sources. 3 



Name 


Source 


Date 


Contig/Entry 


Coordinates 


39_9 Tfu 


JGI 


1 NovOO 


39 


1 3655-1 5508 


SCP1.201 Sco 


NCBI 




13620683 +32 N aa 




3875_87 Mma 


NCBI 




2161 4488 


*■-» f— j-aa ^a __ -_ aaf af-% W~ 

76532-751 65 


B0369+ Nme-B 


NCBI 




» a*-* J" i a#~^ ^aj aa-^ — fa | / 

7225591 +34 N aa 




B0372+ Nme-B 


NCBI 




7225594 +1 1 N aa 




B0655+ Nme-B 


— a ^aaaw a« A 

NCBI 




«ka ^a-kk a— M -a> » am a • 

7225882 +14 N aa 




J% am a- *— fx * Jk 

A2115+ Nme-A 


NCBI 




15794988 +24 N aa 




MafB1 Nme-C 


Sanger 


f - ft M >a*X a^v 

1 5May02 


NmC 


j| ^kkv ^Ha -mma ^ m mm ^— V a*** -m» a ja. A 

1833717-1835480 


BIL2 Nme-C 


Sanger 


1 5May02 


NmC 


a^^ a^> ma .^m am. a*>a a^%. 

1 836857-1 837573 


BIL3 Nme-C 


Sanger 


aM ft Ji aV^V a^>* 

1 5May02 


NmC 


^aa ^« ^^a a* .^m. a»a, ^ml ^'v m*^ a«m. a'™*. a*^« al 

1838418-1838981 


BIL4 Nme-C 


Sanger 


1 5May02 


NmC 


a*> a*"*. iPK ^m. mm mm _| _m A _ A a| A 

1839771-1 840439 


BIL5 Nme-C 


Sanger 


a*] |— ■ J| . V af-»| 

1 5May02 


■a 1 

NmC 


a^> *~* ~~ aT" «"V _ jam, m—fc — ™ ^\ a^\ J - 4 

627204-627920 


BIL6 Nme-C 


Sanger 


1 5May02 


NmC 


628395-629102 


MafB1 Ngo 


.a*-s». a ■ afc .a™*, m" a. aaaaat 

OU-ACGT 


26SepOO 


AE004969 


1560214-1561941 


BIL2 Ngo 


m"»a a a j_ a-a^ a*""V mm* 

OU-ACGT 


26Sep00 


AE004969 


^aa a^ A -a _ — _ — mm ^a a -■ a 1 v 

1563413-1564129 


BIL3 Ngo 


0U-ACGT 


26SepOO 


AE004969 


1565033-1565809 


MafB2 Ngo 


OU-ACGT 


26SepOO 


AE004969 


1 355876-1 354062 


BIL5 Ngo 


OU-ACGT 


26Sep00 


AE004969 


1351509-1350766 


BIL6 Ngo 


OU-ACGT 


26SepOO 


AE004969 


1 349978-1 349310 


FhaB Psy 


TIGR 


30Aug02 


5668 


5148986-5149429 


FhaB Mha 


BCM 


40ct01 


C78-C79-C80-C81 
C82-C83-C84-C85 


1 1046-20977 


BIL1 Cth 


NCBI 




22262155 


5528-4683 


^P^a m m aa am. ^ ft 

BIL2 Cth 


NCBI 




22262016 


2476-1667 


■ a, ft ■ a«a a ■ 

BIL3 Cth 


NCBI 




22262092 


2185-2685 


BIL4 Cth 


NCBI 




22262016 


7936-6224 


BIL5 Cth 


NCBI 




22262016 


4623-3736 


BIL6 Cth 


NCBI 




22262092 


1-1035 


BIL7 Cth 


NCBI 




22262145 


1412-2059 


■ma ■ ■ ma* rf*"V m 1 

BIL8 Cth 


1 m™» ma, a 

NCBI 




22262205 


1 202-462 


mm- > ■ ma . a 

BIL9 Cth 


NCBI 




22262260 


657-1 


BIL10 Cth 


NCBI 




a^*fe avm. aP*a aP^ a^^k a*^ .ami » 

2226201 7 


34594-34986 


BIL1 1 Cth 


NCBI 




222621 76 


41 6-728 


4825 Rsp 


JGI 


26Mar01 


184 


^H^a ^m» m aaa mv afc am — mm- 

67785-67165 


f a, ■ ■ -ma mma 

BIL2 Rsp 


JGI 


26Mar01 


1 77 


9673-10194 


00588 Rca 


NCBI 




3128319 




01522 Rca 


IG 


Dec01 


.ami am aa .afll aff 1 ^ a*^ a**a, 

1A01-1C09 


^aa mm _ — _ _ — _ al m ^ma mB ^aa m a ama 

273248-279442 


a - "*! ^\ » m*a a - »*. mma 

02710 Rca 


IG 


DecOl 


1D09-1F02 


^a^a a*v ^^m. ^a^a — am. aa mrm ma, 

197288-199588 


01 524 Rca 


IG 


DecOl 


.a* Jk a - ^aa a*"^L ji l j w 

1A01-1C09 


aa j m ^a m J w ^a _— _ _— _ — _ _ — _ ma 

280569-281423 


a^*V a<m #— a A ah 

01 523 Rca 


IG 


ph, aaw — 

Dec01 


_^a ak aa^a _^mj a^ma a«L 

1 A01-1C09 


^aa^ ^^m ma aa aa ma _ _ _ _ - _ _ — — 

279638-280444 


00126 Rca 


i a*™* 

IG 


a^* aa« 

Dec01 


»"•». a***k a*"^ a^V aaa v pa ^ ^ 

2G06-2D1 1 


a* m m mm ma mm mm — ma. ma mm mma 

1 1 4767-1 1 3670 


01 21 6 Rca 


IG 


Dec01 


a<-v aS. a«a a* 1 ^ S^ B a> a^v r— 

2A1 2-2D05 


222590-223555 


00949 Rca 


IG 


DecOl 


j' ^ aa _^a ^aiaa rf*a ma 

2A1 2-2D05 


m*aa mma, ma ma aa ma aa am. _m 

325707-326651 


01374 Rca 


IG 


DecOl 


2A12-2D05 


148470-149462 


00459 Rca 


IG 


DecOl 


2G06-2D1 1 


434469-435083 


00460 Rca 


IG 


DecOl 


2G06-2D11 


435094-436191 


00746 Rca 


IG 


DecOl 


2D10-2D06 


2243-3079 


03530 Rca 


IG 


DecOl 


1A01-1C09 


521700-521173 


00199 Rca 


IG 


DecOl 


2G06-2D11 


178648-177806 


BIL2 Mma 


NCBI 




21613062 


922-1590 


BIL3 Mma 


NCBI 




21614112 


2449-1475 


BIL4 Mma 


NCBI 




21614173 


2216-3187 



BIL5 Mma 
BIL6 Mma 
110519 Bme 



NCBI 
NCBI 
NCBI 



21612572 
21613847 
17988864 



3-338 
2033-1774 



a. The sources are named as follows JGI - Joint Genome Institute (http://wwwjgi.doe.gov), NCBI - 
(http://www.ncbi.nlm.nih.gov), Sanger - The Sanger Institute (http://www.sanger.ac.uk), OU-ACGT 
University of Oklahoma, Advanced Center for Genome Technology (http://www.genome.ou.edu), TIGF 
Institute for Genomic Research (http://www.tigr.org), BCM - Baylor College of Medicine Human Gen 
Sequencing Center (http://www.hgsc.bcm.tmc.edu) and IG - IntegratedGenomics 
(http://www.integratedgenomics.com). 

Dates refer to the data release dates used. The NCBI entries were extended as noted. Coordinates o 
BIL host protein ORF are given for nucleotide contigs/entries. The positions of BILs within these O 
listed in Table SI . 



